facos: finding api relevant contents on stack overflow

12
FACOS: Finding API Relevant Contents on Stack Overflow with Semantic and Syntactic Analysis Kien Luong * , Mohammad Hadi , Ferdian Thung * , Fatemeh Fard , and David Lo * * School of Computing and Information Systems, Singapore Management University Irving K. Barber Faculty of Science, University of British Columbia {kiengialuong, ferdianthung, davidlo}@smu.edu.sg {mohammad.hadi, fatemeh.fard}@ubc.ca Abstract—Collecting API examples, usages, and mentions rel- evant to a specific API method over discussions on venues such as Stack Overflow is not a trivial problem. It requires efforts to correctly recognize whether the discussion refers to the API method that developers/tools are searching for. The content of the thread, which consists of both text paragraphs describing the involvement of the API method in the discussion and the code snippets containing the API invocation, may refer to the given API method. Leveraging this observation, we develop FACOS, a context-specific algorithm to capture the semantic and syntactic information of the paragraphs and code snippets in a discussion. FACOS combines a syntactic word-based score with a score from a predictive model fine-tuned from CodeBERT. FACOS beats the state-of-the-art approach by 13.9% in terms of F1-score. I. I NTRODUCTION Developers typically use existing libraries or frameworks to implement certain common functionalities. Understanding which APIs to use, the methods they offer, their distinctive names, and how to use them is vital in this regard. There may be hundreds or even thousands of APIs in a large- scale software library such as the .NET framework and JDK. Microsoft conducted a survey in 2009 in which 67.6% of respondents said that inadequate or absent resources hindered learning APIs [1]. In order to gain a deeper understanding of APIs and their usage information, developers need to inspect many web pages manually and they use automated code search tools. Most of the Code search tools do not consider the semantics of natural language queries because they are based on keyword matching. Stack Overflow is the second most common place for the developers to discover APIs, their simple method names, and their usage through crowd-sourced questions and answers. As many API names share simple names but provide different functionality, it is difficult to find code snippets and APIs that correspond to the specific problem searched by the developers on these platforms. Moreover, API mentions in the informal text content of Stack Overflow are often ambiguous, which makes it difficult to track down APIs and learn their uses. Developers frequently discuss and mention APIs in natural language in online discussion and question answering forums like Stack Overflow [2]–[5]. When developers or automated tools are looking for a specific API, names of API methods sharing the same name can be ambiguous. Therefore, we re- quire API disambiguation to support several downstream tasks such as API recommendation and API mining. To properly index and link APIs to their related information in various sources (e.g., Stack Overflow, Javadoc, etc.), it is important to link ambiguous API mentions to their actual APIs correctly. Luong et al. recently proposed DATYS [6], which uses type-scoping to disambiguate API mentions in informal text content on Stack Overflow. In type scoping, they considered API methods whose types appear in more parts (i.e., scopes) of a Stack Overflow thread as more likely to refer to the searched API method. However, the statistical word alignment model it uses is based on the appearance of words in a sentence rather than considering in which context the sequence of words are being used and what connotations do these words relaying to the readers. APIs are often discussed and mentioned in natural language in online forums such as Stack Overflow to better understand them. Developers or automated tools looking for a specific API would be confused by API methods with the same name. In Stack Overflow, API disambiguation is crucial to finding APIs. Several downstream tasks, such as API recommendation [7], [8], are supported by this collection, including API mining [7], [8], which relies on interpreting API mentions correctly to index and link APIs to relevant information in various data sources, including Stack Overflow, Javadoc, etc. To incorporate a deeper understanding of the underlying semantics in a natural language text content of Stack Overflow, we introduce FACOS, a context-specific algorithm to capture the semantic and syntactic information of the paragraphs and code snippets in a crowd-sourced discussion. We call this API resource retrieval task because FACOS focuses on finding Stack Overflow threads mentioning a given API method. Our work also modified DATYS to perform better search over the code snippets in the Stack Overflow discussion threads. The modified DATYS, denoted as DATYS+ provides an additional metric to better capture the occurrence of API method type in the Stack Overflow thread. By greedily matching the type name with the tokens in the code snippet, DATYS+ performs the syntactic search in FACOS. Yet, both DATYS and DATYS+ are only searching based on the syntactic information provided by the fully qualified name of a target API method. It cannot capture the semantic meaning in paragraphs and code snippets of threads on Stack Overflow and how similar they are to the target API method. Thus, to capture the semantics, arXiv:2111.07238v1 [cs.SE] 14 Nov 2021

Upload: khangminh22

Post on 28-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

FACOS Finding API Relevant Contents on StackOverflow with Semantic and Syntactic Analysis

Kien Luonglowast Mohammad Hadidagger Ferdian Thunglowast Fatemeh Farddagger and David LolowastlowastSchool of Computing and Information Systems Singapore Management University

daggerIrving K Barber Faculty of Science University of British Columbiakiengialuong ferdianthung davidlosmuedusg

mohammadhadi fatemehfardubcca

AbstractmdashCollecting API examples usages and mentions rel-evant to a specific API method over discussions on venues suchas Stack Overflow is not a trivial problem It requires effortsto correctly recognize whether the discussion refers to the APImethod that developerstools are searching for The content ofthe thread which consists of both text paragraphs describing theinvolvement of the API method in the discussion and the codesnippets containing the API invocation may refer to the givenAPI method Leveraging this observation we develop FACOS acontext-specific algorithm to capture the semantic and syntacticinformation of the paragraphs and code snippets in a discussionFACOS combines a syntactic word-based score with a score froma predictive model fine-tuned from CodeBERT FACOS beats thestate-of-the-art approach by 139 in terms of F1-score

I INTRODUCTION

Developers typically use existing libraries or frameworksto implement certain common functionalities Understandingwhich APIs to use the methods they offer their distinctivenames and how to use them is vital in this regard Theremay be hundreds or even thousands of APIs in a large-scale software library such as the NET framework and JDKMicrosoft conducted a survey in 2009 in which 676 ofrespondents said that inadequate or absent resources hinderedlearning APIs [1]

In order to gain a deeper understanding of APIs and theirusage information developers need to inspect many web pagesmanually and they use automated code search tools Most ofthe Code search tools do not consider the semantics of naturallanguage queries because they are based on keyword matchingStack Overflow is the second most common place for thedevelopers to discover APIs their simple method names andtheir usage through crowd-sourced questions and answers Asmany API names share simple names but provide differentfunctionality it is difficult to find code snippets and APIs thatcorrespond to the specific problem searched by the developerson these platforms Moreover API mentions in the informaltext content of Stack Overflow are often ambiguous whichmakes it difficult to track down APIs and learn their uses

Developers frequently discuss and mention APIs in naturallanguage in online discussion and question answering forumslike Stack Overflow [2]ndash[5] When developers or automatedtools are looking for a specific API names of API methodssharing the same name can be ambiguous Therefore we re-quire API disambiguation to support several downstream tasks

such as API recommendation and API mining To properlyindex and link APIs to their related information in varioussources (eg Stack Overflow Javadoc etc) it is important tolink ambiguous API mentions to their actual APIs correctly

Luong et al recently proposed DATYS [6] which usestype-scoping to disambiguate API mentions in informal textcontent on Stack Overflow In type scoping they consideredAPI methods whose types appear in more parts (ie scopes) ofa Stack Overflow thread as more likely to refer to the searchedAPI method However the statistical word alignment model ituses is based on the appearance of words in a sentence ratherthan considering in which context the sequence of words arebeing used and what connotations do these words relaying tothe readers

APIs are often discussed and mentioned in natural languagein online forums such as Stack Overflow to better understandthem Developers or automated tools looking for a specific APIwould be confused by API methods with the same name InStack Overflow API disambiguation is crucial to finding APIsSeveral downstream tasks such as API recommendation [7][8] are supported by this collection including API mining [7][8] which relies on interpreting API mentions correctly toindex and link APIs to relevant information in various datasources including Stack Overflow Javadoc etc

To incorporate a deeper understanding of the underlyingsemantics in a natural language text content of Stack Overflowwe introduce FACOS a context-specific algorithm to capturethe semantic and syntactic information of the paragraphs andcode snippets in a crowd-sourced discussion We call thisAPI resource retrieval task because FACOS focuses on findingStack Overflow threads mentioning a given API method Ourwork also modified DATYS to perform better search over thecode snippets in the Stack Overflow discussion threads Themodified DATYS denoted as DATYS+ provides an additionalmetric to better capture the occurrence of API method typein the Stack Overflow thread By greedily matching thetype name with the tokens in the code snippet DATYS+performs the syntactic search in FACOS Yet both DATYS andDATYS+ are only searching based on the syntactic informationprovided by the fully qualified name of a target API method Itcannot capture the semantic meaning in paragraphs and codesnippets of threads on Stack Overflow and how similar theyare to the target API method Thus to capture the semantics

arX

iv2

111

0723

8v1

[cs

SE

] 1

4 N

ov 2

021

in addition to the weighted syntactic information providedby DATYS+ FACOS has a semantic search component thatleverages a deep attention based Transformer model Code-BERT [9] This semantic search component measures thesimilarity between the paragraphs and code snippets of a StackOverflow thread with the target API method comment andimplementation code The more similar they are the morelikely the thread to be relevant to the target API method that wesearch for To efficiently leverage both semantic and syntacticknowledge of Stack Overflow thread and the API methodFACOS joined the semantic and the syntactic search elementto get the joint relevance score that determines whether athread relates to a given API method The contributions of eachelement in the joint relevance score is defined by a weightingfactor

In this paper we are going to answer with the followingresearch questionsRQ1 Can FACOS perform better than the baseline (DATYS)RQ2 How well does each component of FACOS performRQ3 How does the weighting factor affect the F1-score of

FACOSThese research questions will help us understand the effec-

tiveness of our approach FACOS and the internal mechanismthrough which it yields better results than the current baselineOur work has offered the following main contributions

1) To our knowledge we are the first to adopt a transformer-based deep learning technique to incorporate semanticknowledge understanding for API resource retrieval task

2) As compared to state-of-the-art techniques our approachperforms better while searching for the contents relatedto the queried API On a dataset of 380 Stack Overflowthreads FACOS beats the state-of-the-art by 139

3) We designed an ablation study to understand how theintegrated components of our approach are performingWe found that each component contributes to the effec-tiveness of FACOS

4) We have also open sourced our code and additional arti-facts required for recreating the results and re-purposingour approach for other tasks The source code for FA-COS is available at httpsanonymous4opensciencerfacos-E5C6

The rest of the paper is structured as follows Section IIdeals with the preliminary knowledge about the componentson top of which we have built our method Section III providesan overview of our proposed approach while Section IVelaborates the various components of our proposed approachWe describe our experiment details and results in Section Vand VI respectively The related works and the threats tovalidity are presented in Sections VIII and VII-D Finally weconcluded our work and present future work in Section IX

II PRELIMINARIES

A DATYS

Two steps are involved in finding the API mentioned ininformal text content (1) API mentions extraction and (2)

API mentions disambiguation API mention extraction aims toidentify common words that refer to the APIs API mentionsdisambiguation on the other hand links API mentions withthe APIs they reference DATYS [6] specifically deals with theAPI mention disambiguation via type scoping in the informaltext of Stack Overflow to resolve ambiguous mentions of JavaAPI methods where the mentions have been identified

After extracting API method candidates from input Javalibraries DATYS scores API method candidates based on howoften their types (ie classes or interfaces) appear in differentparts (ie scopes) of the Stack Overflow thread with identifiedAPI mentions Having a type that appears in more scopeswill increase the API candidate score Here DATYS considersthree scopes Mention scope which covers the mention itselfText scope which covers the textual content of the threadincluding the mentions Code scope which covers the codesnippets in the thread API candidates are ranked according totheir scores for each API mention in the thread DATYS takesthe top API candidate with a non-zero score as the mentionedAPI If the leading API candidate has a zero score DATYSconsiders the mention as an unknown API Luong et al builta ground truth dataset containing 807 Java API mentions from380 threads in Stack Overflow

B CodeBERT

CodeBERT [9] was developed using a multilayeredattention-based Transformer model BERT [10] As a resultof its effectiveness in learning contextual representation frommassive unlabeled text with self-supervised objectives theBERT model has been adopted widely to develop large pre-trained models Thanks to the multilayer Transformer [11]CodeBERT developers adopted two different approaches thanBERT to learn semantic connections between Natural Lan-guage (NL) - Programming Language (PL) more effectively

Firstly The CodeBERT developers make use of both bi-modal instances of NL-PL pairs (ie code snippets andfunction-level comments or documentations) and a largeamount of available unimodal codes In addition the devel-opers have pre-trained CodeBERT using a hybrid objectivefunction which includes masked language modeling [10] andreplaced token detection [12] The incorporation of unimodalcodes helps the replaced token detection task which in turnproduces better natural language by detecting plausible alter-natives sampled from generators

Developers trained CodeBERT from Github code reposito-ries in 6 programming languages where only one pre-trainedmodel is learned for all six programming languages with noexplicit indicators used to mark an instance to the one out ofsix input programming languages CodeBERT was evaluatedon two downstream tasks natural language code search andcode documentation generation The study found that fine-tuning the parameters of CodeBERT obtained state-of-the-artresults on both tasks

2

Query using the FQN ofthe given API method

A given API method

API comment amp API Implementation

code

FACOS

DATYS+ API RelevanceClassifier

Predicted relevantthreads

Step

1St

ep 2

Return

User or Tool

Potential threadsAPI Candidates

Fig 1 The architecture of FACOS (Finding API RelevantContents on Stack Overflow with Semantic and SyntacticAnalysis)

III APPROACH OVERVIEW

A Task Definition

Our goal is to find Stack Overflow threads that mention agiven API method1 Specifically given an API method westrive to find Stack Overflow threads containing words match-ing the simple name of the given API method In Java the sim-ple name of an API method is the name of the method withoutthe class and the package names For example m is the simplename of API method comexampleClassm We wantto classify whether the threads having the simple name m isactually relevant to API method comexampleClassmIn summary the task is defined as ldquoFor each API method ina set of given API methods identify Stack Overflow threadsthat refer to itrdquo

B Architecture

The pipeline of FACOS is presented in Figure 1 It is dividedinto 2 main steps

(1) Collecting various API-related resources from a givenAPI method name and (2) Recommending relevant threadsusing the collected API-related resources

In the step (1) FACOS finds Potential Threads from StackOverflow using the simple name of the given API method asthe query Potential Threads are the threads that have at least

1In this paper we use the terms API method and API method interchange-ably

one word matching with the simple name of given API TheAPI method comment and implementation code are directlyobtained from the source code repository of the given APILast but not least the API Candidates are obtained froma database of API methods The API Candidates are APImethods that have the same simple name as the given APImethod

The objective of step (2) is to identify whether each StackOverflow thread in the Potential Threads actually refers tothe given API FACOS has two components API relevanceclassifier and DATYS+ API relevance classifier is designedto draw the relevance between a thread and an API methodby capturing the semantic similarity between (1) paragraphsand code snippets in the thread and (2) API method commentand implementation code API relevance classifier outputs asemantic relevance score representing the relevance it mea-sures In contrast DATYS+ outputs a syntactic relevance scorebased on the existence of the terms from the fully qualifiedname of the given API in different scopes of a Stack Overflowthread For example API rdquoABCrdquo has terms such as rdquoArdquo rdquoBrdquoand rdquoCrdquo The last term rdquoCrdquo is the simple name of the APImethod The second last term rdquoBrdquo is the type of the APImethod Both DATYS and DATYS+ use type scoping [6] togive score based on the existence of the type of the API (ierdquoBrdquo in the example) in different scopes of the Stack Overflowthread (code scope text scope etc) However type scopingof DATYS+ is modified to be suitable for the search task andwe are going to describe it in Section IV-A It outputs a scorethat indicates the syntactic relevance between the given APIand the thread we call it DATYS+ score

After step (2) each thread will have a score indicatingif the thread refers to the given API method This scoreis combined from semantic relevance score and DATYS+score and is called joint relevance score Threads predictedas referred to the given API method are then returned to theuser We describe FACOS components (ie DATYS+ and APIrelevance classifier) in detail in Section IV

IV FACOS

FACOS consists of two main components DATYS+ andAPI relevance classifier DATYS+ takes as inputs PotentialThreads and API Candidates and outputs scores indicating itsconfidence that the given API is referred to in the threads(Section IV-A) Given Potential Threads and API method com-ment and implementation code FACOS first converts them toAPI relevance embedding (Section IV-B) The API relevanceembedding is input to API relevance classifier which outputsconfidence scores indicating the likelihood that the giventhreads refer to the API (Section IV-C) Finally the scoresfrom DATYS+ and API relevance classifier are combined toa joint relevance score and threads with scores larger than athreshold are returned as the relevant threads (Section IV-D)

A DATYS+

DATYS+ is an extension of DATYS DATYS used regularexpressions to capture the types of API method invocations

3

available in code snippets of the thread However these regularexpressions are limited and thus DATYS may miss somementions in code snippets To capture more types DATYS+modifies the type scoping algorithm by adding a new score

Algorithm 1 indicates how modified type scoping worksCompared to DATYSrsquos DATYS+rsquos type scoping algorithmreceives CodeSnippets as another input CodeSnippets rep-resents the content available in code snippets of the StackOverflow thread In addition inputs of the original type scop-ing algorithm are also considered APIMention PTypeList APIMethodCandidate and ThreadContent stand for thesimple name of the given API the list of possible typesextracted from code snippets following the algorithm usedby DATYS the API Candidates and the threadrsquos textualcontent (ie title text tags) respectively The three scopesused by DATYS are also used in DATYS+ In Mention Scope(Lines 3-8) DATYS+ increases an API score if its typeappear within the API mention In Text Scope (Lines 10-13)DATYS+ increases an API score if its type appear withinthe textual content of the thread In Code Scope (Lines 17-21) DATYS+ increases an API score if its type matches withthe type of method invocation or imported types in the codesnippet Additionally in Code Scope DATYS+ also looks atthe content of the code snippets and increases the API scoreof the corresponding API candidate if there are tokens in thecode snippets that match with the API type (Lines 14-16) Thisscore helps to capture the occurrence of types that would bemissed by a more accurate matching used in DATYS Thuswe call the scope of this score Extended Code Scope

After executing type scoping DATYS+ returns scores forthe API Candidates The scores are then normalized to a rangeof [0 1] following the minimum and the maximum score fromthe API Candidates DATYS+ then takes the normalized scoreof the given API method and passes it to the next step

B API relevance embedding

We follow the process described in Figure 2 to build APIrelevance embedding Firstly each thread in Potential Threadsneeds to be converted into an embedding A thread maycontain m paragraphs and n code snippets A paragraph isa piece of textual content on a Stack Overflow thread thatis separated from other contents in the thread via a newlinecharacter Code snippet is a piece of code content on a StackOverflow thread It is typically enclosed with a starting tag〈pre〉〈code〉 and an ending tag 〈code〉〈pre〉 Each paragraphis paired with each code snippet to create a pair of threadcontent Therefore a Stack Overflow thread would have mtimesnthread content pairs A natural-programming language modelCodeBERT2 is used to extract the semantic meaning of eachthread content pair It encodes the m times n thread contentpairs into m times n thread embeddings thread embedding isthe representation vector of thread content that created byCodeBERTrsquos encoder By converting the pairs from a textualform to a numerical vector form with a pre-trained CodeBERT

2httpsgithubcommicrosoftCodeBERT

Algorithm 1 Scoring an API Candidate with Type Scoping inDATYS+Input ApiMention PTypesList APIMethodCandidate

ThreadContent CodeSnippetsOutput CandScore1 CandScore = 02 CandType = getType(APIMethodCandidate)3 if hasPrefix(ApiMention) then4 Prefix = getPrefix(ApiMention)5 if endsWith(Prefix CandType) then6 CandScore = CandScore+ 17 end if8 end if9 TextualTokens = tokenize(ThreadContent)

10 CodeTokens = tokenize(CodeSnippets)11 if CandType in TextualTokens then12 CandScore = CandScore+ 113 end if14 if CandType in CodeTokens then15 CandScore = CandScore+ 116 end if17 for PType in PTypesList do18 if isSameType(PType CandType) then19 CandScore = CandScore+ 120 end if21 end for22 return CandScore

model the semantic relationship between the paragraphs andcode snippets is extracted Before feeding the thread contentpairs into the encoder of CodeBERT each pair is pre-processedfollowing the format

〈CLS〉 paragraph 〈SEP 〉 code snippet 〈EOS〉

〈CLS〉 is the token that informs the start of the pair accordingto the design of RoBERTa model [13] which CodeBERT isbased on 〈SEP 〉 is the token that separates a Paragraph froma Code Snippet and 〈EOS〉 indicates the end of the pair Indetail the maximum number of tokens in a pair before beingfed into CodeBERT encoder is 512 We set the number oftokens for a paragraph and a code snippet to 254 and 255tokens respectively The two numbers add up to 512 whenthe three tokens such as 〈CLS〉 〈SEP 〉 and 〈EOS〉 arecounted If the number of tokens in the paragraph is lessthan 254 then padding tokens would be added to reach 254tokens On the other hand if the number of tokens in theparagraph is more than 254 we truncate the paragraph andtake the first 254 tokens The same process is applied tothe code snippet with 255 tokens The CodeBERT encoderreceives these thread content pairs under this format as inputsand outputs embedding vectors For a thread with mtimesn threadcontent pairs there would be mtimesn thread embedding vectorscreated and each thread embedding vector has a length of 768

Secondly to build API relevance embedding API commentand implementation code also need to be converted into anembedding The API method comment is a piece of textualcontent that describes the functionality of the API methodand how to use it The API implementation code is the codeinside the API method body that implements the describedfunctionality The API comment and implementation code are

4

A Thread

n Code Snippet

A given APImethod

API methodcomment

API methodimplementation

code

m x n Thread Content

pairs

1 MethodContent pair ofthe given API

If Thread refersthe given API

method

CodeBERT

m x n Threadembeddings

(each has 768d)

1 Methodembedding

(each has 768d)Concatenate

m x n API relevanceembeddings

(each has 1536d)

m Paragraph

Fig 2 How API relevance embeddings are created

extracted from the Javadoc and the JAR files respectively theyare pre-processed to the following format

〈CLS〉 comment 〈SEP 〉 implementation code 〈EOS〉

They are then transformed into a numerical representationvector via the CodeBERT encoder

Finally each thread embedding vector and the methodembedding vector are then concatenated to a vector We callthis concatenated vector API relevance embedding In totalmtimes n API relevance embedding vectors would be created

C API relevance classifier

The API relevance classifier is a binary classifier thatutilizes a neural network with two fully connected layers topredict whether the API relevance embedding comes from aStack Overflow thread that refers to the given API method

The API relevance classifier has two modes of operationtraining and deployment modes In the training mode theAPI relevance embeddings are used to train the API relevanceclassifier When there is an imbalance between positive andnegative labels API relevance classifier upsamples the minor-ity label Whenever the thread refers to the given API methodall API relevance embedding created from the thread would beconsidered as positive by the classifier Otherwise in case thegiven API method is not referred to by the thread every APIrelevance embedding of the thread would have negative labels

API relevenceclassifier

Average probability Bof the Positive class

DATYS+

Score A of the givenAPI method

Joint Relevance Score C = x A + (1-x) B

B

if C gt threshold Thread does notrefer to API

Thread refers toAPI

Yes

No

Input (a thread amp a given API)

output output

combine

Fig 3 Computing joint relevance score

In the deployment mode API relevance classifier producesprobability scores for the m times n API relevance embeddingThese scores are averaged and passed to the next step Theaveraged score indicates the likelihood that the thread refersto the given API

D Computing joint relevance score

We follow the process in Figure 3 to compute the jointrelevance score DATYS+ and API relevance classifier outputscores A and B respectively Both represent their confidencethat the given API method is mentioned in the thread Thetwo scores are then combined to a joint relevance score Cfollowing this formula

C = xtimesA+ (1minus x)timesB (1)

The weighting factor x decides the contributions of DATYS+score and API relevance classifier in joint relevance scoreC The higher the value of x is the more DATYS+ scorecontributes to the final joint relevance score The range of AB and x is from 0 to 1 A thread is considered to refer tothe given API if the joint relevance score C is larger than athreshold t Otherwise the thread is considered not to refer tothe given API By default t is set to 05

The value of x will be estimated based on the trainingdata In detail we let x increase gradually from 0 to 1with a step of 01 There are ten possible values of x

5

TABLE I Number of API relevance embeddings in each set

API relevance embeddingsTraining set 57690Testing set 26212

0 01 02 09 10 The value of x giving the highestperformance in the training data is then chosen

V EXPERIMENT

A Dataset and Experimental Settings

We utilize the dataset provided in DATYS work [6] toevaluate both FACOS and DATYS We split 380 Stack Over-flow threads to 253 training threads and 127 testing threadswith the ratio of 21 The training threads are utilized totrain the API relevance classifier while the testing threads areused to evaluate FACOS and DATYS Next as mentioned inSection IV-B for each Stack Overflow thread in the trainingthreads we extract its thread embeddings and these threadembeddings are grouped into a training set Similarly for eachStack Overflow thread in the testing threads we extract itsthread embeddings and these thread embeddings are groupedinto a testing set The numbers of API relevance embeddingsof the dataset are shown in Table I The numbers of theembeddings in training set and testing set are 57 690 and26 212 respectively

To generate API relevance embeddings for the API rele-vance classifier for training for each thread if the given APIappears in the thread we generate API relevance embeddingsfor thread contents and method contents as described inSection IV-B These embeddings would have positive labelbecause they are created from the API that is referred to bythe thread To generate embeddings with a negative label fora thread we find APIs that have the same simple name asthe given API and are not mentioned in the thread We thencreate API relevance embeddings from these APIs and labelthese API relevance embeddings as negative

To train the API relevance classifier there are 344 APIsThese APIs are used to generate the training method em-beddings In the testing set there are 181 APIs These APIsare used to generate the testing method embeddings Table IIshows the numbers of positive and negative API relevanceembeddings created in training and testing sets The number ofnegative API relevance embeddings is approximately 4 timesmore compared to the positive ones in the same set of threadDue to this imbalance positive API relevance embeddings arerandomly up-sampled to balance the two classes within theAPI relevance classifier training process

The API relevance classifier is trained using 6 epochs onthe training data After the first 6 epochs the value of theloss function has relatively converged The learning rate ofthe training is set to 10minus3

B Metrics

To evaluate the proposed approach on identifying threadsthat are relevant to an API we use three metrics Precision

TABLE II Number of positive and negative API relevanceembeddings in each set

Positive Embeddings in Training set 9934Negative Embeddings in Training set 47756Positive Embeddings in Testing set 5607Negative Embeddings in Testing set 20605

Recall and F1-score In order to calculate the three afore-mentioned metrics True Positive False Positive and FalseNegative should be defined first Our task focuses on findingthreads that actually refer to a given API True Positive is thecase where a thread is deemed to be relevant by the approach isindeed relevant False Positive is the case where the thread thatis deemed to be relevant by the approach is actually irrelevantFalse Negative is the case where a threads is deemed to beirrelevant by the approach is actually relevant The metrics arecalculated using the following formulas

Precision =True Positive

True Positive+ False Positive(2)

Recall =True Positive

True Positive+ False Negative(3)

F1-score =2times PrecisiontimesRecall

Precision+Recall(4)

We measure the above scores of all given APIs in the testingset and report the averages of the scores

C Research Questions

Research Question 1 Can FACOS perform better than thebaseline (DATYS)The baseline DATYS was designed for a task of API mentiondisambiguation We adopt it to our task of finding threads thatare relevant to an API If DATYS finds an API is mentionedin the thread the thread is considered to be relevant to theAPI To evaluate the improvement that FACOS over DATYSwe evaluate them in the testing data set and compare them interms of F1-score We also analyse some cases that FACOScan resolve and DATYS can not in Section VII-A

Research Question 2 How well does each component ofFACOS performThere are three possible variants of of FACOS depending onwhich component that comes along with it The variants are(1) FACOS with API relevance classifier (2) FACOS withDATYS+ and (3) FACOS with DATYS+ and API relevanceclassifier API relevance classifier is a semantic-based algo-rithm while DATYS+ is a syntactic-based algorithm In thisstudy we aim to analyze the contribution of each componentin FACOS From the analysis we would like to answer thequestion whether combining a semantic-based algorithm and asyntactic-based algorithm leads to a better result than runningthem individually

Research Question 3 How does the weighting factor affectthe F1-score of the relevant thread classification Does ourstrategy work well

6

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

in addition to the weighted syntactic information providedby DATYS+ FACOS has a semantic search component thatleverages a deep attention based Transformer model Code-BERT [9] This semantic search component measures thesimilarity between the paragraphs and code snippets of a StackOverflow thread with the target API method comment andimplementation code The more similar they are the morelikely the thread to be relevant to the target API method that wesearch for To efficiently leverage both semantic and syntacticknowledge of Stack Overflow thread and the API methodFACOS joined the semantic and the syntactic search elementto get the joint relevance score that determines whether athread relates to a given API method The contributions of eachelement in the joint relevance score is defined by a weightingfactor

In this paper we are going to answer with the followingresearch questionsRQ1 Can FACOS perform better than the baseline (DATYS)RQ2 How well does each component of FACOS performRQ3 How does the weighting factor affect the F1-score of

FACOSThese research questions will help us understand the effec-

tiveness of our approach FACOS and the internal mechanismthrough which it yields better results than the current baselineOur work has offered the following main contributions

1) To our knowledge we are the first to adopt a transformer-based deep learning technique to incorporate semanticknowledge understanding for API resource retrieval task

2) As compared to state-of-the-art techniques our approachperforms better while searching for the contents relatedto the queried API On a dataset of 380 Stack Overflowthreads FACOS beats the state-of-the-art by 139

3) We designed an ablation study to understand how theintegrated components of our approach are performingWe found that each component contributes to the effec-tiveness of FACOS

4) We have also open sourced our code and additional arti-facts required for recreating the results and re-purposingour approach for other tasks The source code for FA-COS is available at httpsanonymous4opensciencerfacos-E5C6

The rest of the paper is structured as follows Section IIdeals with the preliminary knowledge about the componentson top of which we have built our method Section III providesan overview of our proposed approach while Section IVelaborates the various components of our proposed approachWe describe our experiment details and results in Section Vand VI respectively The related works and the threats tovalidity are presented in Sections VIII and VII-D Finally weconcluded our work and present future work in Section IX

II PRELIMINARIES

A DATYS

Two steps are involved in finding the API mentioned ininformal text content (1) API mentions extraction and (2)

API mentions disambiguation API mention extraction aims toidentify common words that refer to the APIs API mentionsdisambiguation on the other hand links API mentions withthe APIs they reference DATYS [6] specifically deals with theAPI mention disambiguation via type scoping in the informaltext of Stack Overflow to resolve ambiguous mentions of JavaAPI methods where the mentions have been identified

After extracting API method candidates from input Javalibraries DATYS scores API method candidates based on howoften their types (ie classes or interfaces) appear in differentparts (ie scopes) of the Stack Overflow thread with identifiedAPI mentions Having a type that appears in more scopeswill increase the API candidate score Here DATYS considersthree scopes Mention scope which covers the mention itselfText scope which covers the textual content of the threadincluding the mentions Code scope which covers the codesnippets in the thread API candidates are ranked according totheir scores for each API mention in the thread DATYS takesthe top API candidate with a non-zero score as the mentionedAPI If the leading API candidate has a zero score DATYSconsiders the mention as an unknown API Luong et al builta ground truth dataset containing 807 Java API mentions from380 threads in Stack Overflow

B CodeBERT

CodeBERT [9] was developed using a multilayeredattention-based Transformer model BERT [10] As a resultof its effectiveness in learning contextual representation frommassive unlabeled text with self-supervised objectives theBERT model has been adopted widely to develop large pre-trained models Thanks to the multilayer Transformer [11]CodeBERT developers adopted two different approaches thanBERT to learn semantic connections between Natural Lan-guage (NL) - Programming Language (PL) more effectively

Firstly The CodeBERT developers make use of both bi-modal instances of NL-PL pairs (ie code snippets andfunction-level comments or documentations) and a largeamount of available unimodal codes In addition the devel-opers have pre-trained CodeBERT using a hybrid objectivefunction which includes masked language modeling [10] andreplaced token detection [12] The incorporation of unimodalcodes helps the replaced token detection task which in turnproduces better natural language by detecting plausible alter-natives sampled from generators

Developers trained CodeBERT from Github code reposito-ries in 6 programming languages where only one pre-trainedmodel is learned for all six programming languages with noexplicit indicators used to mark an instance to the one out ofsix input programming languages CodeBERT was evaluatedon two downstream tasks natural language code search andcode documentation generation The study found that fine-tuning the parameters of CodeBERT obtained state-of-the-artresults on both tasks

2

Query using the FQN ofthe given API method

A given API method

API comment amp API Implementation

code

FACOS

DATYS+ API RelevanceClassifier

Predicted relevantthreads

Step

1St

ep 2

Return

User or Tool

Potential threadsAPI Candidates

Fig 1 The architecture of FACOS (Finding API RelevantContents on Stack Overflow with Semantic and SyntacticAnalysis)

III APPROACH OVERVIEW

A Task Definition

Our goal is to find Stack Overflow threads that mention agiven API method1 Specifically given an API method westrive to find Stack Overflow threads containing words match-ing the simple name of the given API method In Java the sim-ple name of an API method is the name of the method withoutthe class and the package names For example m is the simplename of API method comexampleClassm We wantto classify whether the threads having the simple name m isactually relevant to API method comexampleClassmIn summary the task is defined as ldquoFor each API method ina set of given API methods identify Stack Overflow threadsthat refer to itrdquo

B Architecture

The pipeline of FACOS is presented in Figure 1 It is dividedinto 2 main steps

(1) Collecting various API-related resources from a givenAPI method name and (2) Recommending relevant threadsusing the collected API-related resources

In the step (1) FACOS finds Potential Threads from StackOverflow using the simple name of the given API method asthe query Potential Threads are the threads that have at least

1In this paper we use the terms API method and API method interchange-ably

one word matching with the simple name of given API TheAPI method comment and implementation code are directlyobtained from the source code repository of the given APILast but not least the API Candidates are obtained froma database of API methods The API Candidates are APImethods that have the same simple name as the given APImethod

The objective of step (2) is to identify whether each StackOverflow thread in the Potential Threads actually refers tothe given API FACOS has two components API relevanceclassifier and DATYS+ API relevance classifier is designedto draw the relevance between a thread and an API methodby capturing the semantic similarity between (1) paragraphsand code snippets in the thread and (2) API method commentand implementation code API relevance classifier outputs asemantic relevance score representing the relevance it mea-sures In contrast DATYS+ outputs a syntactic relevance scorebased on the existence of the terms from the fully qualifiedname of the given API in different scopes of a Stack Overflowthread For example API rdquoABCrdquo has terms such as rdquoArdquo rdquoBrdquoand rdquoCrdquo The last term rdquoCrdquo is the simple name of the APImethod The second last term rdquoBrdquo is the type of the APImethod Both DATYS and DATYS+ use type scoping [6] togive score based on the existence of the type of the API (ierdquoBrdquo in the example) in different scopes of the Stack Overflowthread (code scope text scope etc) However type scopingof DATYS+ is modified to be suitable for the search task andwe are going to describe it in Section IV-A It outputs a scorethat indicates the syntactic relevance between the given APIand the thread we call it DATYS+ score

After step (2) each thread will have a score indicatingif the thread refers to the given API method This scoreis combined from semantic relevance score and DATYS+score and is called joint relevance score Threads predictedas referred to the given API method are then returned to theuser We describe FACOS components (ie DATYS+ and APIrelevance classifier) in detail in Section IV

IV FACOS

FACOS consists of two main components DATYS+ andAPI relevance classifier DATYS+ takes as inputs PotentialThreads and API Candidates and outputs scores indicating itsconfidence that the given API is referred to in the threads(Section IV-A) Given Potential Threads and API method com-ment and implementation code FACOS first converts them toAPI relevance embedding (Section IV-B) The API relevanceembedding is input to API relevance classifier which outputsconfidence scores indicating the likelihood that the giventhreads refer to the API (Section IV-C) Finally the scoresfrom DATYS+ and API relevance classifier are combined toa joint relevance score and threads with scores larger than athreshold are returned as the relevant threads (Section IV-D)

A DATYS+

DATYS+ is an extension of DATYS DATYS used regularexpressions to capture the types of API method invocations

3

available in code snippets of the thread However these regularexpressions are limited and thus DATYS may miss somementions in code snippets To capture more types DATYS+modifies the type scoping algorithm by adding a new score

Algorithm 1 indicates how modified type scoping worksCompared to DATYSrsquos DATYS+rsquos type scoping algorithmreceives CodeSnippets as another input CodeSnippets rep-resents the content available in code snippets of the StackOverflow thread In addition inputs of the original type scop-ing algorithm are also considered APIMention PTypeList APIMethodCandidate and ThreadContent stand for thesimple name of the given API the list of possible typesextracted from code snippets following the algorithm usedby DATYS the API Candidates and the threadrsquos textualcontent (ie title text tags) respectively The three scopesused by DATYS are also used in DATYS+ In Mention Scope(Lines 3-8) DATYS+ increases an API score if its typeappear within the API mention In Text Scope (Lines 10-13)DATYS+ increases an API score if its type appear withinthe textual content of the thread In Code Scope (Lines 17-21) DATYS+ increases an API score if its type matches withthe type of method invocation or imported types in the codesnippet Additionally in Code Scope DATYS+ also looks atthe content of the code snippets and increases the API scoreof the corresponding API candidate if there are tokens in thecode snippets that match with the API type (Lines 14-16) Thisscore helps to capture the occurrence of types that would bemissed by a more accurate matching used in DATYS Thuswe call the scope of this score Extended Code Scope

After executing type scoping DATYS+ returns scores forthe API Candidates The scores are then normalized to a rangeof [0 1] following the minimum and the maximum score fromthe API Candidates DATYS+ then takes the normalized scoreof the given API method and passes it to the next step

B API relevance embedding

We follow the process described in Figure 2 to build APIrelevance embedding Firstly each thread in Potential Threadsneeds to be converted into an embedding A thread maycontain m paragraphs and n code snippets A paragraph isa piece of textual content on a Stack Overflow thread thatis separated from other contents in the thread via a newlinecharacter Code snippet is a piece of code content on a StackOverflow thread It is typically enclosed with a starting tag〈pre〉〈code〉 and an ending tag 〈code〉〈pre〉 Each paragraphis paired with each code snippet to create a pair of threadcontent Therefore a Stack Overflow thread would have mtimesnthread content pairs A natural-programming language modelCodeBERT2 is used to extract the semantic meaning of eachthread content pair It encodes the m times n thread contentpairs into m times n thread embeddings thread embedding isthe representation vector of thread content that created byCodeBERTrsquos encoder By converting the pairs from a textualform to a numerical vector form with a pre-trained CodeBERT

2httpsgithubcommicrosoftCodeBERT

Algorithm 1 Scoring an API Candidate with Type Scoping inDATYS+Input ApiMention PTypesList APIMethodCandidate

ThreadContent CodeSnippetsOutput CandScore1 CandScore = 02 CandType = getType(APIMethodCandidate)3 if hasPrefix(ApiMention) then4 Prefix = getPrefix(ApiMention)5 if endsWith(Prefix CandType) then6 CandScore = CandScore+ 17 end if8 end if9 TextualTokens = tokenize(ThreadContent)

10 CodeTokens = tokenize(CodeSnippets)11 if CandType in TextualTokens then12 CandScore = CandScore+ 113 end if14 if CandType in CodeTokens then15 CandScore = CandScore+ 116 end if17 for PType in PTypesList do18 if isSameType(PType CandType) then19 CandScore = CandScore+ 120 end if21 end for22 return CandScore

model the semantic relationship between the paragraphs andcode snippets is extracted Before feeding the thread contentpairs into the encoder of CodeBERT each pair is pre-processedfollowing the format

〈CLS〉 paragraph 〈SEP 〉 code snippet 〈EOS〉

〈CLS〉 is the token that informs the start of the pair accordingto the design of RoBERTa model [13] which CodeBERT isbased on 〈SEP 〉 is the token that separates a Paragraph froma Code Snippet and 〈EOS〉 indicates the end of the pair Indetail the maximum number of tokens in a pair before beingfed into CodeBERT encoder is 512 We set the number oftokens for a paragraph and a code snippet to 254 and 255tokens respectively The two numbers add up to 512 whenthe three tokens such as 〈CLS〉 〈SEP 〉 and 〈EOS〉 arecounted If the number of tokens in the paragraph is lessthan 254 then padding tokens would be added to reach 254tokens On the other hand if the number of tokens in theparagraph is more than 254 we truncate the paragraph andtake the first 254 tokens The same process is applied tothe code snippet with 255 tokens The CodeBERT encoderreceives these thread content pairs under this format as inputsand outputs embedding vectors For a thread with mtimesn threadcontent pairs there would be mtimesn thread embedding vectorscreated and each thread embedding vector has a length of 768

Secondly to build API relevance embedding API commentand implementation code also need to be converted into anembedding The API method comment is a piece of textualcontent that describes the functionality of the API methodand how to use it The API implementation code is the codeinside the API method body that implements the describedfunctionality The API comment and implementation code are

4

A Thread

n Code Snippet

A given APImethod

API methodcomment

API methodimplementation

code

m x n Thread Content

pairs

1 MethodContent pair ofthe given API

If Thread refersthe given API

method

CodeBERT

m x n Threadembeddings

(each has 768d)

1 Methodembedding

(each has 768d)Concatenate

m x n API relevanceembeddings

(each has 1536d)

m Paragraph

Fig 2 How API relevance embeddings are created

extracted from the Javadoc and the JAR files respectively theyare pre-processed to the following format

〈CLS〉 comment 〈SEP 〉 implementation code 〈EOS〉

They are then transformed into a numerical representationvector via the CodeBERT encoder

Finally each thread embedding vector and the methodembedding vector are then concatenated to a vector We callthis concatenated vector API relevance embedding In totalmtimes n API relevance embedding vectors would be created

C API relevance classifier

The API relevance classifier is a binary classifier thatutilizes a neural network with two fully connected layers topredict whether the API relevance embedding comes from aStack Overflow thread that refers to the given API method

The API relevance classifier has two modes of operationtraining and deployment modes In the training mode theAPI relevance embeddings are used to train the API relevanceclassifier When there is an imbalance between positive andnegative labels API relevance classifier upsamples the minor-ity label Whenever the thread refers to the given API methodall API relevance embedding created from the thread would beconsidered as positive by the classifier Otherwise in case thegiven API method is not referred to by the thread every APIrelevance embedding of the thread would have negative labels

API relevenceclassifier

Average probability Bof the Positive class

DATYS+

Score A of the givenAPI method

Joint Relevance Score C = x A + (1-x) B

B

if C gt threshold Thread does notrefer to API

Thread refers toAPI

Yes

No

Input (a thread amp a given API)

output output

combine

Fig 3 Computing joint relevance score

In the deployment mode API relevance classifier producesprobability scores for the m times n API relevance embeddingThese scores are averaged and passed to the next step Theaveraged score indicates the likelihood that the thread refersto the given API

D Computing joint relevance score

We follow the process in Figure 3 to compute the jointrelevance score DATYS+ and API relevance classifier outputscores A and B respectively Both represent their confidencethat the given API method is mentioned in the thread Thetwo scores are then combined to a joint relevance score Cfollowing this formula

C = xtimesA+ (1minus x)timesB (1)

The weighting factor x decides the contributions of DATYS+score and API relevance classifier in joint relevance scoreC The higher the value of x is the more DATYS+ scorecontributes to the final joint relevance score The range of AB and x is from 0 to 1 A thread is considered to refer tothe given API if the joint relevance score C is larger than athreshold t Otherwise the thread is considered not to refer tothe given API By default t is set to 05

The value of x will be estimated based on the trainingdata In detail we let x increase gradually from 0 to 1with a step of 01 There are ten possible values of x

5

TABLE I Number of API relevance embeddings in each set

API relevance embeddingsTraining set 57690Testing set 26212

0 01 02 09 10 The value of x giving the highestperformance in the training data is then chosen

V EXPERIMENT

A Dataset and Experimental Settings

We utilize the dataset provided in DATYS work [6] toevaluate both FACOS and DATYS We split 380 Stack Over-flow threads to 253 training threads and 127 testing threadswith the ratio of 21 The training threads are utilized totrain the API relevance classifier while the testing threads areused to evaluate FACOS and DATYS Next as mentioned inSection IV-B for each Stack Overflow thread in the trainingthreads we extract its thread embeddings and these threadembeddings are grouped into a training set Similarly for eachStack Overflow thread in the testing threads we extract itsthread embeddings and these thread embeddings are groupedinto a testing set The numbers of API relevance embeddingsof the dataset are shown in Table I The numbers of theembeddings in training set and testing set are 57 690 and26 212 respectively

To generate API relevance embeddings for the API rele-vance classifier for training for each thread if the given APIappears in the thread we generate API relevance embeddingsfor thread contents and method contents as described inSection IV-B These embeddings would have positive labelbecause they are created from the API that is referred to bythe thread To generate embeddings with a negative label fora thread we find APIs that have the same simple name asthe given API and are not mentioned in the thread We thencreate API relevance embeddings from these APIs and labelthese API relevance embeddings as negative

To train the API relevance classifier there are 344 APIsThese APIs are used to generate the training method em-beddings In the testing set there are 181 APIs These APIsare used to generate the testing method embeddings Table IIshows the numbers of positive and negative API relevanceembeddings created in training and testing sets The number ofnegative API relevance embeddings is approximately 4 timesmore compared to the positive ones in the same set of threadDue to this imbalance positive API relevance embeddings arerandomly up-sampled to balance the two classes within theAPI relevance classifier training process

The API relevance classifier is trained using 6 epochs onthe training data After the first 6 epochs the value of theloss function has relatively converged The learning rate ofthe training is set to 10minus3

B Metrics

To evaluate the proposed approach on identifying threadsthat are relevant to an API we use three metrics Precision

TABLE II Number of positive and negative API relevanceembeddings in each set

Positive Embeddings in Training set 9934Negative Embeddings in Training set 47756Positive Embeddings in Testing set 5607Negative Embeddings in Testing set 20605

Recall and F1-score In order to calculate the three afore-mentioned metrics True Positive False Positive and FalseNegative should be defined first Our task focuses on findingthreads that actually refer to a given API True Positive is thecase where a thread is deemed to be relevant by the approach isindeed relevant False Positive is the case where the thread thatis deemed to be relevant by the approach is actually irrelevantFalse Negative is the case where a threads is deemed to beirrelevant by the approach is actually relevant The metrics arecalculated using the following formulas

Precision =True Positive

True Positive+ False Positive(2)

Recall =True Positive

True Positive+ False Negative(3)

F1-score =2times PrecisiontimesRecall

Precision+Recall(4)

We measure the above scores of all given APIs in the testingset and report the averages of the scores

C Research Questions

Research Question 1 Can FACOS perform better than thebaseline (DATYS)The baseline DATYS was designed for a task of API mentiondisambiguation We adopt it to our task of finding threads thatare relevant to an API If DATYS finds an API is mentionedin the thread the thread is considered to be relevant to theAPI To evaluate the improvement that FACOS over DATYSwe evaluate them in the testing data set and compare them interms of F1-score We also analyse some cases that FACOScan resolve and DATYS can not in Section VII-A

Research Question 2 How well does each component ofFACOS performThere are three possible variants of of FACOS depending onwhich component that comes along with it The variants are(1) FACOS with API relevance classifier (2) FACOS withDATYS+ and (3) FACOS with DATYS+ and API relevanceclassifier API relevance classifier is a semantic-based algo-rithm while DATYS+ is a syntactic-based algorithm In thisstudy we aim to analyze the contribution of each componentin FACOS From the analysis we would like to answer thequestion whether combining a semantic-based algorithm and asyntactic-based algorithm leads to a better result than runningthem individually

Research Question 3 How does the weighting factor affectthe F1-score of the relevant thread classification Does ourstrategy work well

6

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

Query using the FQN ofthe given API method

A given API method

API comment amp API Implementation

code

FACOS

DATYS+ API RelevanceClassifier

Predicted relevantthreads

Step

1St

ep 2

Return

User or Tool

Potential threadsAPI Candidates

Fig 1 The architecture of FACOS (Finding API RelevantContents on Stack Overflow with Semantic and SyntacticAnalysis)

III APPROACH OVERVIEW

A Task Definition

Our goal is to find Stack Overflow threads that mention agiven API method1 Specifically given an API method westrive to find Stack Overflow threads containing words match-ing the simple name of the given API method In Java the sim-ple name of an API method is the name of the method withoutthe class and the package names For example m is the simplename of API method comexampleClassm We wantto classify whether the threads having the simple name m isactually relevant to API method comexampleClassmIn summary the task is defined as ldquoFor each API method ina set of given API methods identify Stack Overflow threadsthat refer to itrdquo

B Architecture

The pipeline of FACOS is presented in Figure 1 It is dividedinto 2 main steps

(1) Collecting various API-related resources from a givenAPI method name and (2) Recommending relevant threadsusing the collected API-related resources

In the step (1) FACOS finds Potential Threads from StackOverflow using the simple name of the given API method asthe query Potential Threads are the threads that have at least

1In this paper we use the terms API method and API method interchange-ably

one word matching with the simple name of given API TheAPI method comment and implementation code are directlyobtained from the source code repository of the given APILast but not least the API Candidates are obtained froma database of API methods The API Candidates are APImethods that have the same simple name as the given APImethod

The objective of step (2) is to identify whether each StackOverflow thread in the Potential Threads actually refers tothe given API FACOS has two components API relevanceclassifier and DATYS+ API relevance classifier is designedto draw the relevance between a thread and an API methodby capturing the semantic similarity between (1) paragraphsand code snippets in the thread and (2) API method commentand implementation code API relevance classifier outputs asemantic relevance score representing the relevance it mea-sures In contrast DATYS+ outputs a syntactic relevance scorebased on the existence of the terms from the fully qualifiedname of the given API in different scopes of a Stack Overflowthread For example API rdquoABCrdquo has terms such as rdquoArdquo rdquoBrdquoand rdquoCrdquo The last term rdquoCrdquo is the simple name of the APImethod The second last term rdquoBrdquo is the type of the APImethod Both DATYS and DATYS+ use type scoping [6] togive score based on the existence of the type of the API (ierdquoBrdquo in the example) in different scopes of the Stack Overflowthread (code scope text scope etc) However type scopingof DATYS+ is modified to be suitable for the search task andwe are going to describe it in Section IV-A It outputs a scorethat indicates the syntactic relevance between the given APIand the thread we call it DATYS+ score

After step (2) each thread will have a score indicatingif the thread refers to the given API method This scoreis combined from semantic relevance score and DATYS+score and is called joint relevance score Threads predictedas referred to the given API method are then returned to theuser We describe FACOS components (ie DATYS+ and APIrelevance classifier) in detail in Section IV

IV FACOS

FACOS consists of two main components DATYS+ andAPI relevance classifier DATYS+ takes as inputs PotentialThreads and API Candidates and outputs scores indicating itsconfidence that the given API is referred to in the threads(Section IV-A) Given Potential Threads and API method com-ment and implementation code FACOS first converts them toAPI relevance embedding (Section IV-B) The API relevanceembedding is input to API relevance classifier which outputsconfidence scores indicating the likelihood that the giventhreads refer to the API (Section IV-C) Finally the scoresfrom DATYS+ and API relevance classifier are combined toa joint relevance score and threads with scores larger than athreshold are returned as the relevant threads (Section IV-D)

A DATYS+

DATYS+ is an extension of DATYS DATYS used regularexpressions to capture the types of API method invocations

3

available in code snippets of the thread However these regularexpressions are limited and thus DATYS may miss somementions in code snippets To capture more types DATYS+modifies the type scoping algorithm by adding a new score

Algorithm 1 indicates how modified type scoping worksCompared to DATYSrsquos DATYS+rsquos type scoping algorithmreceives CodeSnippets as another input CodeSnippets rep-resents the content available in code snippets of the StackOverflow thread In addition inputs of the original type scop-ing algorithm are also considered APIMention PTypeList APIMethodCandidate and ThreadContent stand for thesimple name of the given API the list of possible typesextracted from code snippets following the algorithm usedby DATYS the API Candidates and the threadrsquos textualcontent (ie title text tags) respectively The three scopesused by DATYS are also used in DATYS+ In Mention Scope(Lines 3-8) DATYS+ increases an API score if its typeappear within the API mention In Text Scope (Lines 10-13)DATYS+ increases an API score if its type appear withinthe textual content of the thread In Code Scope (Lines 17-21) DATYS+ increases an API score if its type matches withthe type of method invocation or imported types in the codesnippet Additionally in Code Scope DATYS+ also looks atthe content of the code snippets and increases the API scoreof the corresponding API candidate if there are tokens in thecode snippets that match with the API type (Lines 14-16) Thisscore helps to capture the occurrence of types that would bemissed by a more accurate matching used in DATYS Thuswe call the scope of this score Extended Code Scope

After executing type scoping DATYS+ returns scores forthe API Candidates The scores are then normalized to a rangeof [0 1] following the minimum and the maximum score fromthe API Candidates DATYS+ then takes the normalized scoreof the given API method and passes it to the next step

B API relevance embedding

We follow the process described in Figure 2 to build APIrelevance embedding Firstly each thread in Potential Threadsneeds to be converted into an embedding A thread maycontain m paragraphs and n code snippets A paragraph isa piece of textual content on a Stack Overflow thread thatis separated from other contents in the thread via a newlinecharacter Code snippet is a piece of code content on a StackOverflow thread It is typically enclosed with a starting tag〈pre〉〈code〉 and an ending tag 〈code〉〈pre〉 Each paragraphis paired with each code snippet to create a pair of threadcontent Therefore a Stack Overflow thread would have mtimesnthread content pairs A natural-programming language modelCodeBERT2 is used to extract the semantic meaning of eachthread content pair It encodes the m times n thread contentpairs into m times n thread embeddings thread embedding isthe representation vector of thread content that created byCodeBERTrsquos encoder By converting the pairs from a textualform to a numerical vector form with a pre-trained CodeBERT

2httpsgithubcommicrosoftCodeBERT

Algorithm 1 Scoring an API Candidate with Type Scoping inDATYS+Input ApiMention PTypesList APIMethodCandidate

ThreadContent CodeSnippetsOutput CandScore1 CandScore = 02 CandType = getType(APIMethodCandidate)3 if hasPrefix(ApiMention) then4 Prefix = getPrefix(ApiMention)5 if endsWith(Prefix CandType) then6 CandScore = CandScore+ 17 end if8 end if9 TextualTokens = tokenize(ThreadContent)

10 CodeTokens = tokenize(CodeSnippets)11 if CandType in TextualTokens then12 CandScore = CandScore+ 113 end if14 if CandType in CodeTokens then15 CandScore = CandScore+ 116 end if17 for PType in PTypesList do18 if isSameType(PType CandType) then19 CandScore = CandScore+ 120 end if21 end for22 return CandScore

model the semantic relationship between the paragraphs andcode snippets is extracted Before feeding the thread contentpairs into the encoder of CodeBERT each pair is pre-processedfollowing the format

〈CLS〉 paragraph 〈SEP 〉 code snippet 〈EOS〉

〈CLS〉 is the token that informs the start of the pair accordingto the design of RoBERTa model [13] which CodeBERT isbased on 〈SEP 〉 is the token that separates a Paragraph froma Code Snippet and 〈EOS〉 indicates the end of the pair Indetail the maximum number of tokens in a pair before beingfed into CodeBERT encoder is 512 We set the number oftokens for a paragraph and a code snippet to 254 and 255tokens respectively The two numbers add up to 512 whenthe three tokens such as 〈CLS〉 〈SEP 〉 and 〈EOS〉 arecounted If the number of tokens in the paragraph is lessthan 254 then padding tokens would be added to reach 254tokens On the other hand if the number of tokens in theparagraph is more than 254 we truncate the paragraph andtake the first 254 tokens The same process is applied tothe code snippet with 255 tokens The CodeBERT encoderreceives these thread content pairs under this format as inputsand outputs embedding vectors For a thread with mtimesn threadcontent pairs there would be mtimesn thread embedding vectorscreated and each thread embedding vector has a length of 768

Secondly to build API relevance embedding API commentand implementation code also need to be converted into anembedding The API method comment is a piece of textualcontent that describes the functionality of the API methodand how to use it The API implementation code is the codeinside the API method body that implements the describedfunctionality The API comment and implementation code are

4

A Thread

n Code Snippet

A given APImethod

API methodcomment

API methodimplementation

code

m x n Thread Content

pairs

1 MethodContent pair ofthe given API

If Thread refersthe given API

method

CodeBERT

m x n Threadembeddings

(each has 768d)

1 Methodembedding

(each has 768d)Concatenate

m x n API relevanceembeddings

(each has 1536d)

m Paragraph

Fig 2 How API relevance embeddings are created

extracted from the Javadoc and the JAR files respectively theyare pre-processed to the following format

〈CLS〉 comment 〈SEP 〉 implementation code 〈EOS〉

They are then transformed into a numerical representationvector via the CodeBERT encoder

Finally each thread embedding vector and the methodembedding vector are then concatenated to a vector We callthis concatenated vector API relevance embedding In totalmtimes n API relevance embedding vectors would be created

C API relevance classifier

The API relevance classifier is a binary classifier thatutilizes a neural network with two fully connected layers topredict whether the API relevance embedding comes from aStack Overflow thread that refers to the given API method

The API relevance classifier has two modes of operationtraining and deployment modes In the training mode theAPI relevance embeddings are used to train the API relevanceclassifier When there is an imbalance between positive andnegative labels API relevance classifier upsamples the minor-ity label Whenever the thread refers to the given API methodall API relevance embedding created from the thread would beconsidered as positive by the classifier Otherwise in case thegiven API method is not referred to by the thread every APIrelevance embedding of the thread would have negative labels

API relevenceclassifier

Average probability Bof the Positive class

DATYS+

Score A of the givenAPI method

Joint Relevance Score C = x A + (1-x) B

B

if C gt threshold Thread does notrefer to API

Thread refers toAPI

Yes

No

Input (a thread amp a given API)

output output

combine

Fig 3 Computing joint relevance score

In the deployment mode API relevance classifier producesprobability scores for the m times n API relevance embeddingThese scores are averaged and passed to the next step Theaveraged score indicates the likelihood that the thread refersto the given API

D Computing joint relevance score

We follow the process in Figure 3 to compute the jointrelevance score DATYS+ and API relevance classifier outputscores A and B respectively Both represent their confidencethat the given API method is mentioned in the thread Thetwo scores are then combined to a joint relevance score Cfollowing this formula

C = xtimesA+ (1minus x)timesB (1)

The weighting factor x decides the contributions of DATYS+score and API relevance classifier in joint relevance scoreC The higher the value of x is the more DATYS+ scorecontributes to the final joint relevance score The range of AB and x is from 0 to 1 A thread is considered to refer tothe given API if the joint relevance score C is larger than athreshold t Otherwise the thread is considered not to refer tothe given API By default t is set to 05

The value of x will be estimated based on the trainingdata In detail we let x increase gradually from 0 to 1with a step of 01 There are ten possible values of x

5

TABLE I Number of API relevance embeddings in each set

API relevance embeddingsTraining set 57690Testing set 26212

0 01 02 09 10 The value of x giving the highestperformance in the training data is then chosen

V EXPERIMENT

A Dataset and Experimental Settings

We utilize the dataset provided in DATYS work [6] toevaluate both FACOS and DATYS We split 380 Stack Over-flow threads to 253 training threads and 127 testing threadswith the ratio of 21 The training threads are utilized totrain the API relevance classifier while the testing threads areused to evaluate FACOS and DATYS Next as mentioned inSection IV-B for each Stack Overflow thread in the trainingthreads we extract its thread embeddings and these threadembeddings are grouped into a training set Similarly for eachStack Overflow thread in the testing threads we extract itsthread embeddings and these thread embeddings are groupedinto a testing set The numbers of API relevance embeddingsof the dataset are shown in Table I The numbers of theembeddings in training set and testing set are 57 690 and26 212 respectively

To generate API relevance embeddings for the API rele-vance classifier for training for each thread if the given APIappears in the thread we generate API relevance embeddingsfor thread contents and method contents as described inSection IV-B These embeddings would have positive labelbecause they are created from the API that is referred to bythe thread To generate embeddings with a negative label fora thread we find APIs that have the same simple name asthe given API and are not mentioned in the thread We thencreate API relevance embeddings from these APIs and labelthese API relevance embeddings as negative

To train the API relevance classifier there are 344 APIsThese APIs are used to generate the training method em-beddings In the testing set there are 181 APIs These APIsare used to generate the testing method embeddings Table IIshows the numbers of positive and negative API relevanceembeddings created in training and testing sets The number ofnegative API relevance embeddings is approximately 4 timesmore compared to the positive ones in the same set of threadDue to this imbalance positive API relevance embeddings arerandomly up-sampled to balance the two classes within theAPI relevance classifier training process

The API relevance classifier is trained using 6 epochs onthe training data After the first 6 epochs the value of theloss function has relatively converged The learning rate ofthe training is set to 10minus3

B Metrics

To evaluate the proposed approach on identifying threadsthat are relevant to an API we use three metrics Precision

TABLE II Number of positive and negative API relevanceembeddings in each set

Positive Embeddings in Training set 9934Negative Embeddings in Training set 47756Positive Embeddings in Testing set 5607Negative Embeddings in Testing set 20605

Recall and F1-score In order to calculate the three afore-mentioned metrics True Positive False Positive and FalseNegative should be defined first Our task focuses on findingthreads that actually refer to a given API True Positive is thecase where a thread is deemed to be relevant by the approach isindeed relevant False Positive is the case where the thread thatis deemed to be relevant by the approach is actually irrelevantFalse Negative is the case where a threads is deemed to beirrelevant by the approach is actually relevant The metrics arecalculated using the following formulas

Precision =True Positive

True Positive+ False Positive(2)

Recall =True Positive

True Positive+ False Negative(3)

F1-score =2times PrecisiontimesRecall

Precision+Recall(4)

We measure the above scores of all given APIs in the testingset and report the averages of the scores

C Research Questions

Research Question 1 Can FACOS perform better than thebaseline (DATYS)The baseline DATYS was designed for a task of API mentiondisambiguation We adopt it to our task of finding threads thatare relevant to an API If DATYS finds an API is mentionedin the thread the thread is considered to be relevant to theAPI To evaluate the improvement that FACOS over DATYSwe evaluate them in the testing data set and compare them interms of F1-score We also analyse some cases that FACOScan resolve and DATYS can not in Section VII-A

Research Question 2 How well does each component ofFACOS performThere are three possible variants of of FACOS depending onwhich component that comes along with it The variants are(1) FACOS with API relevance classifier (2) FACOS withDATYS+ and (3) FACOS with DATYS+ and API relevanceclassifier API relevance classifier is a semantic-based algo-rithm while DATYS+ is a syntactic-based algorithm In thisstudy we aim to analyze the contribution of each componentin FACOS From the analysis we would like to answer thequestion whether combining a semantic-based algorithm and asyntactic-based algorithm leads to a better result than runningthem individually

Research Question 3 How does the weighting factor affectthe F1-score of the relevant thread classification Does ourstrategy work well

6

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

available in code snippets of the thread However these regularexpressions are limited and thus DATYS may miss somementions in code snippets To capture more types DATYS+modifies the type scoping algorithm by adding a new score

Algorithm 1 indicates how modified type scoping worksCompared to DATYSrsquos DATYS+rsquos type scoping algorithmreceives CodeSnippets as another input CodeSnippets rep-resents the content available in code snippets of the StackOverflow thread In addition inputs of the original type scop-ing algorithm are also considered APIMention PTypeList APIMethodCandidate and ThreadContent stand for thesimple name of the given API the list of possible typesextracted from code snippets following the algorithm usedby DATYS the API Candidates and the threadrsquos textualcontent (ie title text tags) respectively The three scopesused by DATYS are also used in DATYS+ In Mention Scope(Lines 3-8) DATYS+ increases an API score if its typeappear within the API mention In Text Scope (Lines 10-13)DATYS+ increases an API score if its type appear withinthe textual content of the thread In Code Scope (Lines 17-21) DATYS+ increases an API score if its type matches withthe type of method invocation or imported types in the codesnippet Additionally in Code Scope DATYS+ also looks atthe content of the code snippets and increases the API scoreof the corresponding API candidate if there are tokens in thecode snippets that match with the API type (Lines 14-16) Thisscore helps to capture the occurrence of types that would bemissed by a more accurate matching used in DATYS Thuswe call the scope of this score Extended Code Scope

After executing type scoping DATYS+ returns scores forthe API Candidates The scores are then normalized to a rangeof [0 1] following the minimum and the maximum score fromthe API Candidates DATYS+ then takes the normalized scoreof the given API method and passes it to the next step

B API relevance embedding

We follow the process described in Figure 2 to build APIrelevance embedding Firstly each thread in Potential Threadsneeds to be converted into an embedding A thread maycontain m paragraphs and n code snippets A paragraph isa piece of textual content on a Stack Overflow thread thatis separated from other contents in the thread via a newlinecharacter Code snippet is a piece of code content on a StackOverflow thread It is typically enclosed with a starting tag〈pre〉〈code〉 and an ending tag 〈code〉〈pre〉 Each paragraphis paired with each code snippet to create a pair of threadcontent Therefore a Stack Overflow thread would have mtimesnthread content pairs A natural-programming language modelCodeBERT2 is used to extract the semantic meaning of eachthread content pair It encodes the m times n thread contentpairs into m times n thread embeddings thread embedding isthe representation vector of thread content that created byCodeBERTrsquos encoder By converting the pairs from a textualform to a numerical vector form with a pre-trained CodeBERT

2httpsgithubcommicrosoftCodeBERT

Algorithm 1 Scoring an API Candidate with Type Scoping inDATYS+Input ApiMention PTypesList APIMethodCandidate

ThreadContent CodeSnippetsOutput CandScore1 CandScore = 02 CandType = getType(APIMethodCandidate)3 if hasPrefix(ApiMention) then4 Prefix = getPrefix(ApiMention)5 if endsWith(Prefix CandType) then6 CandScore = CandScore+ 17 end if8 end if9 TextualTokens = tokenize(ThreadContent)

10 CodeTokens = tokenize(CodeSnippets)11 if CandType in TextualTokens then12 CandScore = CandScore+ 113 end if14 if CandType in CodeTokens then15 CandScore = CandScore+ 116 end if17 for PType in PTypesList do18 if isSameType(PType CandType) then19 CandScore = CandScore+ 120 end if21 end for22 return CandScore

model the semantic relationship between the paragraphs andcode snippets is extracted Before feeding the thread contentpairs into the encoder of CodeBERT each pair is pre-processedfollowing the format

〈CLS〉 paragraph 〈SEP 〉 code snippet 〈EOS〉

〈CLS〉 is the token that informs the start of the pair accordingto the design of RoBERTa model [13] which CodeBERT isbased on 〈SEP 〉 is the token that separates a Paragraph froma Code Snippet and 〈EOS〉 indicates the end of the pair Indetail the maximum number of tokens in a pair before beingfed into CodeBERT encoder is 512 We set the number oftokens for a paragraph and a code snippet to 254 and 255tokens respectively The two numbers add up to 512 whenthe three tokens such as 〈CLS〉 〈SEP 〉 and 〈EOS〉 arecounted If the number of tokens in the paragraph is lessthan 254 then padding tokens would be added to reach 254tokens On the other hand if the number of tokens in theparagraph is more than 254 we truncate the paragraph andtake the first 254 tokens The same process is applied tothe code snippet with 255 tokens The CodeBERT encoderreceives these thread content pairs under this format as inputsand outputs embedding vectors For a thread with mtimesn threadcontent pairs there would be mtimesn thread embedding vectorscreated and each thread embedding vector has a length of 768

Secondly to build API relevance embedding API commentand implementation code also need to be converted into anembedding The API method comment is a piece of textualcontent that describes the functionality of the API methodand how to use it The API implementation code is the codeinside the API method body that implements the describedfunctionality The API comment and implementation code are

4

A Thread

n Code Snippet

A given APImethod

API methodcomment

API methodimplementation

code

m x n Thread Content

pairs

1 MethodContent pair ofthe given API

If Thread refersthe given API

method

CodeBERT

m x n Threadembeddings

(each has 768d)

1 Methodembedding

(each has 768d)Concatenate

m x n API relevanceembeddings

(each has 1536d)

m Paragraph

Fig 2 How API relevance embeddings are created

extracted from the Javadoc and the JAR files respectively theyare pre-processed to the following format

〈CLS〉 comment 〈SEP 〉 implementation code 〈EOS〉

They are then transformed into a numerical representationvector via the CodeBERT encoder

Finally each thread embedding vector and the methodembedding vector are then concatenated to a vector We callthis concatenated vector API relevance embedding In totalmtimes n API relevance embedding vectors would be created

C API relevance classifier

The API relevance classifier is a binary classifier thatutilizes a neural network with two fully connected layers topredict whether the API relevance embedding comes from aStack Overflow thread that refers to the given API method

The API relevance classifier has two modes of operationtraining and deployment modes In the training mode theAPI relevance embeddings are used to train the API relevanceclassifier When there is an imbalance between positive andnegative labels API relevance classifier upsamples the minor-ity label Whenever the thread refers to the given API methodall API relevance embedding created from the thread would beconsidered as positive by the classifier Otherwise in case thegiven API method is not referred to by the thread every APIrelevance embedding of the thread would have negative labels

API relevenceclassifier

Average probability Bof the Positive class

DATYS+

Score A of the givenAPI method

Joint Relevance Score C = x A + (1-x) B

B

if C gt threshold Thread does notrefer to API

Thread refers toAPI

Yes

No

Input (a thread amp a given API)

output output

combine

Fig 3 Computing joint relevance score

In the deployment mode API relevance classifier producesprobability scores for the m times n API relevance embeddingThese scores are averaged and passed to the next step Theaveraged score indicates the likelihood that the thread refersto the given API

D Computing joint relevance score

We follow the process in Figure 3 to compute the jointrelevance score DATYS+ and API relevance classifier outputscores A and B respectively Both represent their confidencethat the given API method is mentioned in the thread Thetwo scores are then combined to a joint relevance score Cfollowing this formula

C = xtimesA+ (1minus x)timesB (1)

The weighting factor x decides the contributions of DATYS+score and API relevance classifier in joint relevance scoreC The higher the value of x is the more DATYS+ scorecontributes to the final joint relevance score The range of AB and x is from 0 to 1 A thread is considered to refer tothe given API if the joint relevance score C is larger than athreshold t Otherwise the thread is considered not to refer tothe given API By default t is set to 05

The value of x will be estimated based on the trainingdata In detail we let x increase gradually from 0 to 1with a step of 01 There are ten possible values of x

5

TABLE I Number of API relevance embeddings in each set

API relevance embeddingsTraining set 57690Testing set 26212

0 01 02 09 10 The value of x giving the highestperformance in the training data is then chosen

V EXPERIMENT

A Dataset and Experimental Settings

We utilize the dataset provided in DATYS work [6] toevaluate both FACOS and DATYS We split 380 Stack Over-flow threads to 253 training threads and 127 testing threadswith the ratio of 21 The training threads are utilized totrain the API relevance classifier while the testing threads areused to evaluate FACOS and DATYS Next as mentioned inSection IV-B for each Stack Overflow thread in the trainingthreads we extract its thread embeddings and these threadembeddings are grouped into a training set Similarly for eachStack Overflow thread in the testing threads we extract itsthread embeddings and these thread embeddings are groupedinto a testing set The numbers of API relevance embeddingsof the dataset are shown in Table I The numbers of theembeddings in training set and testing set are 57 690 and26 212 respectively

To generate API relevance embeddings for the API rele-vance classifier for training for each thread if the given APIappears in the thread we generate API relevance embeddingsfor thread contents and method contents as described inSection IV-B These embeddings would have positive labelbecause they are created from the API that is referred to bythe thread To generate embeddings with a negative label fora thread we find APIs that have the same simple name asthe given API and are not mentioned in the thread We thencreate API relevance embeddings from these APIs and labelthese API relevance embeddings as negative

To train the API relevance classifier there are 344 APIsThese APIs are used to generate the training method em-beddings In the testing set there are 181 APIs These APIsare used to generate the testing method embeddings Table IIshows the numbers of positive and negative API relevanceembeddings created in training and testing sets The number ofnegative API relevance embeddings is approximately 4 timesmore compared to the positive ones in the same set of threadDue to this imbalance positive API relevance embeddings arerandomly up-sampled to balance the two classes within theAPI relevance classifier training process

The API relevance classifier is trained using 6 epochs onthe training data After the first 6 epochs the value of theloss function has relatively converged The learning rate ofthe training is set to 10minus3

B Metrics

To evaluate the proposed approach on identifying threadsthat are relevant to an API we use three metrics Precision

TABLE II Number of positive and negative API relevanceembeddings in each set

Positive Embeddings in Training set 9934Negative Embeddings in Training set 47756Positive Embeddings in Testing set 5607Negative Embeddings in Testing set 20605

Recall and F1-score In order to calculate the three afore-mentioned metrics True Positive False Positive and FalseNegative should be defined first Our task focuses on findingthreads that actually refer to a given API True Positive is thecase where a thread is deemed to be relevant by the approach isindeed relevant False Positive is the case where the thread thatis deemed to be relevant by the approach is actually irrelevantFalse Negative is the case where a threads is deemed to beirrelevant by the approach is actually relevant The metrics arecalculated using the following formulas

Precision =True Positive

True Positive+ False Positive(2)

Recall =True Positive

True Positive+ False Negative(3)

F1-score =2times PrecisiontimesRecall

Precision+Recall(4)

We measure the above scores of all given APIs in the testingset and report the averages of the scores

C Research Questions

Research Question 1 Can FACOS perform better than thebaseline (DATYS)The baseline DATYS was designed for a task of API mentiondisambiguation We adopt it to our task of finding threads thatare relevant to an API If DATYS finds an API is mentionedin the thread the thread is considered to be relevant to theAPI To evaluate the improvement that FACOS over DATYSwe evaluate them in the testing data set and compare them interms of F1-score We also analyse some cases that FACOScan resolve and DATYS can not in Section VII-A

Research Question 2 How well does each component ofFACOS performThere are three possible variants of of FACOS depending onwhich component that comes along with it The variants are(1) FACOS with API relevance classifier (2) FACOS withDATYS+ and (3) FACOS with DATYS+ and API relevanceclassifier API relevance classifier is a semantic-based algo-rithm while DATYS+ is a syntactic-based algorithm In thisstudy we aim to analyze the contribution of each componentin FACOS From the analysis we would like to answer thequestion whether combining a semantic-based algorithm and asyntactic-based algorithm leads to a better result than runningthem individually

Research Question 3 How does the weighting factor affectthe F1-score of the relevant thread classification Does ourstrategy work well

6

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

A Thread

n Code Snippet

A given APImethod

API methodcomment

API methodimplementation

code

m x n Thread Content

pairs

1 MethodContent pair ofthe given API

If Thread refersthe given API

method

CodeBERT

m x n Threadembeddings

(each has 768d)

1 Methodembedding

(each has 768d)Concatenate

m x n API relevanceembeddings

(each has 1536d)

m Paragraph

Fig 2 How API relevance embeddings are created

extracted from the Javadoc and the JAR files respectively theyare pre-processed to the following format

〈CLS〉 comment 〈SEP 〉 implementation code 〈EOS〉

They are then transformed into a numerical representationvector via the CodeBERT encoder

Finally each thread embedding vector and the methodembedding vector are then concatenated to a vector We callthis concatenated vector API relevance embedding In totalmtimes n API relevance embedding vectors would be created

C API relevance classifier

The API relevance classifier is a binary classifier thatutilizes a neural network with two fully connected layers topredict whether the API relevance embedding comes from aStack Overflow thread that refers to the given API method

The API relevance classifier has two modes of operationtraining and deployment modes In the training mode theAPI relevance embeddings are used to train the API relevanceclassifier When there is an imbalance between positive andnegative labels API relevance classifier upsamples the minor-ity label Whenever the thread refers to the given API methodall API relevance embedding created from the thread would beconsidered as positive by the classifier Otherwise in case thegiven API method is not referred to by the thread every APIrelevance embedding of the thread would have negative labels

API relevenceclassifier

Average probability Bof the Positive class

DATYS+

Score A of the givenAPI method

Joint Relevance Score C = x A + (1-x) B

B

if C gt threshold Thread does notrefer to API

Thread refers toAPI

Yes

No

Input (a thread amp a given API)

output output

combine

Fig 3 Computing joint relevance score

In the deployment mode API relevance classifier producesprobability scores for the m times n API relevance embeddingThese scores are averaged and passed to the next step Theaveraged score indicates the likelihood that the thread refersto the given API

D Computing joint relevance score

We follow the process in Figure 3 to compute the jointrelevance score DATYS+ and API relevance classifier outputscores A and B respectively Both represent their confidencethat the given API method is mentioned in the thread Thetwo scores are then combined to a joint relevance score Cfollowing this formula

C = xtimesA+ (1minus x)timesB (1)

The weighting factor x decides the contributions of DATYS+score and API relevance classifier in joint relevance scoreC The higher the value of x is the more DATYS+ scorecontributes to the final joint relevance score The range of AB and x is from 0 to 1 A thread is considered to refer tothe given API if the joint relevance score C is larger than athreshold t Otherwise the thread is considered not to refer tothe given API By default t is set to 05

The value of x will be estimated based on the trainingdata In detail we let x increase gradually from 0 to 1with a step of 01 There are ten possible values of x

5

TABLE I Number of API relevance embeddings in each set

API relevance embeddingsTraining set 57690Testing set 26212

0 01 02 09 10 The value of x giving the highestperformance in the training data is then chosen

V EXPERIMENT

A Dataset and Experimental Settings

We utilize the dataset provided in DATYS work [6] toevaluate both FACOS and DATYS We split 380 Stack Over-flow threads to 253 training threads and 127 testing threadswith the ratio of 21 The training threads are utilized totrain the API relevance classifier while the testing threads areused to evaluate FACOS and DATYS Next as mentioned inSection IV-B for each Stack Overflow thread in the trainingthreads we extract its thread embeddings and these threadembeddings are grouped into a training set Similarly for eachStack Overflow thread in the testing threads we extract itsthread embeddings and these thread embeddings are groupedinto a testing set The numbers of API relevance embeddingsof the dataset are shown in Table I The numbers of theembeddings in training set and testing set are 57 690 and26 212 respectively

To generate API relevance embeddings for the API rele-vance classifier for training for each thread if the given APIappears in the thread we generate API relevance embeddingsfor thread contents and method contents as described inSection IV-B These embeddings would have positive labelbecause they are created from the API that is referred to bythe thread To generate embeddings with a negative label fora thread we find APIs that have the same simple name asthe given API and are not mentioned in the thread We thencreate API relevance embeddings from these APIs and labelthese API relevance embeddings as negative

To train the API relevance classifier there are 344 APIsThese APIs are used to generate the training method em-beddings In the testing set there are 181 APIs These APIsare used to generate the testing method embeddings Table IIshows the numbers of positive and negative API relevanceembeddings created in training and testing sets The number ofnegative API relevance embeddings is approximately 4 timesmore compared to the positive ones in the same set of threadDue to this imbalance positive API relevance embeddings arerandomly up-sampled to balance the two classes within theAPI relevance classifier training process

The API relevance classifier is trained using 6 epochs onthe training data After the first 6 epochs the value of theloss function has relatively converged The learning rate ofthe training is set to 10minus3

B Metrics

To evaluate the proposed approach on identifying threadsthat are relevant to an API we use three metrics Precision

TABLE II Number of positive and negative API relevanceembeddings in each set

Positive Embeddings in Training set 9934Negative Embeddings in Training set 47756Positive Embeddings in Testing set 5607Negative Embeddings in Testing set 20605

Recall and F1-score In order to calculate the three afore-mentioned metrics True Positive False Positive and FalseNegative should be defined first Our task focuses on findingthreads that actually refer to a given API True Positive is thecase where a thread is deemed to be relevant by the approach isindeed relevant False Positive is the case where the thread thatis deemed to be relevant by the approach is actually irrelevantFalse Negative is the case where a threads is deemed to beirrelevant by the approach is actually relevant The metrics arecalculated using the following formulas

Precision =True Positive

True Positive+ False Positive(2)

Recall =True Positive

True Positive+ False Negative(3)

F1-score =2times PrecisiontimesRecall

Precision+Recall(4)

We measure the above scores of all given APIs in the testingset and report the averages of the scores

C Research Questions

Research Question 1 Can FACOS perform better than thebaseline (DATYS)The baseline DATYS was designed for a task of API mentiondisambiguation We adopt it to our task of finding threads thatare relevant to an API If DATYS finds an API is mentionedin the thread the thread is considered to be relevant to theAPI To evaluate the improvement that FACOS over DATYSwe evaluate them in the testing data set and compare them interms of F1-score We also analyse some cases that FACOScan resolve and DATYS can not in Section VII-A

Research Question 2 How well does each component ofFACOS performThere are three possible variants of of FACOS depending onwhich component that comes along with it The variants are(1) FACOS with API relevance classifier (2) FACOS withDATYS+ and (3) FACOS with DATYS+ and API relevanceclassifier API relevance classifier is a semantic-based algo-rithm while DATYS+ is a syntactic-based algorithm In thisstudy we aim to analyze the contribution of each componentin FACOS From the analysis we would like to answer thequestion whether combining a semantic-based algorithm and asyntactic-based algorithm leads to a better result than runningthem individually

Research Question 3 How does the weighting factor affectthe F1-score of the relevant thread classification Does ourstrategy work well

6

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

TABLE I Number of API relevance embeddings in each set

API relevance embeddingsTraining set 57690Testing set 26212

0 01 02 09 10 The value of x giving the highestperformance in the training data is then chosen

V EXPERIMENT

A Dataset and Experimental Settings

We utilize the dataset provided in DATYS work [6] toevaluate both FACOS and DATYS We split 380 Stack Over-flow threads to 253 training threads and 127 testing threadswith the ratio of 21 The training threads are utilized totrain the API relevance classifier while the testing threads areused to evaluate FACOS and DATYS Next as mentioned inSection IV-B for each Stack Overflow thread in the trainingthreads we extract its thread embeddings and these threadembeddings are grouped into a training set Similarly for eachStack Overflow thread in the testing threads we extract itsthread embeddings and these thread embeddings are groupedinto a testing set The numbers of API relevance embeddingsof the dataset are shown in Table I The numbers of theembeddings in training set and testing set are 57 690 and26 212 respectively

To generate API relevance embeddings for the API rele-vance classifier for training for each thread if the given APIappears in the thread we generate API relevance embeddingsfor thread contents and method contents as described inSection IV-B These embeddings would have positive labelbecause they are created from the API that is referred to bythe thread To generate embeddings with a negative label fora thread we find APIs that have the same simple name asthe given API and are not mentioned in the thread We thencreate API relevance embeddings from these APIs and labelthese API relevance embeddings as negative

To train the API relevance classifier there are 344 APIsThese APIs are used to generate the training method em-beddings In the testing set there are 181 APIs These APIsare used to generate the testing method embeddings Table IIshows the numbers of positive and negative API relevanceembeddings created in training and testing sets The number ofnegative API relevance embeddings is approximately 4 timesmore compared to the positive ones in the same set of threadDue to this imbalance positive API relevance embeddings arerandomly up-sampled to balance the two classes within theAPI relevance classifier training process

The API relevance classifier is trained using 6 epochs onthe training data After the first 6 epochs the value of theloss function has relatively converged The learning rate ofthe training is set to 10minus3

B Metrics

To evaluate the proposed approach on identifying threadsthat are relevant to an API we use three metrics Precision

TABLE II Number of positive and negative API relevanceembeddings in each set

Positive Embeddings in Training set 9934Negative Embeddings in Training set 47756Positive Embeddings in Testing set 5607Negative Embeddings in Testing set 20605

Recall and F1-score In order to calculate the three afore-mentioned metrics True Positive False Positive and FalseNegative should be defined first Our task focuses on findingthreads that actually refer to a given API True Positive is thecase where a thread is deemed to be relevant by the approach isindeed relevant False Positive is the case where the thread thatis deemed to be relevant by the approach is actually irrelevantFalse Negative is the case where a threads is deemed to beirrelevant by the approach is actually relevant The metrics arecalculated using the following formulas

Precision =True Positive

True Positive+ False Positive(2)

Recall =True Positive

True Positive+ False Negative(3)

F1-score =2times PrecisiontimesRecall

Precision+Recall(4)

We measure the above scores of all given APIs in the testingset and report the averages of the scores

C Research Questions

Research Question 1 Can FACOS perform better than thebaseline (DATYS)The baseline DATYS was designed for a task of API mentiondisambiguation We adopt it to our task of finding threads thatare relevant to an API If DATYS finds an API is mentionedin the thread the thread is considered to be relevant to theAPI To evaluate the improvement that FACOS over DATYSwe evaluate them in the testing data set and compare them interms of F1-score We also analyse some cases that FACOScan resolve and DATYS can not in Section VII-A

Research Question 2 How well does each component ofFACOS performThere are three possible variants of of FACOS depending onwhich component that comes along with it The variants are(1) FACOS with API relevance classifier (2) FACOS withDATYS+ and (3) FACOS with DATYS+ and API relevanceclassifier API relevance classifier is a semantic-based algo-rithm while DATYS+ is a syntactic-based algorithm In thisstudy we aim to analyze the contribution of each componentin FACOS From the analysis we would like to answer thequestion whether combining a semantic-based algorithm and asyntactic-based algorithm leads to a better result than runningthem individually

Research Question 3 How does the weighting factor affectthe F1-score of the relevant thread classification Does ourstrategy work well

6

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

TABLE III FACOS vs DATYS in terms of F1-score in thetesting set

Approach Avg Avg AvgPrecision Recall F1-score

DATYS 07441 07703 07340FACOS 08697 09016 08730

TABLE IV Contribution of FACOS Components

Components Avg Avg AvgPrecision Recall F1-score

FACOS 08697 09016 08730FACOS with only API 03408 03658 03408relevance classifierFACOS with only DATYS+ 08620 08723 08530

The weighting factor is an importance factor that would affecthow well FACOS perform We select the importance factorbased on the best performance in the training data We analyzewhether our strategy leads to the best performance in thetesting data We vary the values of weighting factor in both thetraining data and the testing data The values that we use are0 01 02 09 10 We analyze whether picking valuesin the training data that leads to the best performance in thetraining data also leads to the best performance in the testingdata

VI RESULT

A RQ1 FACOS Effectiveness

Table III shows the performance of the DATYS and FACOSin finding threads that are relevant to the given API FACOSin general outperforms DATYS On average FACOS achievesan F1-score of 0873 which is an improvement of 139compared to DATYS FACOS also beats DATYS in terms ofprecision and recall

B RQ2 Ablation Study

Table IV shows how well each component in FACOS isSince the ldquoAPI relevance classifier-onlyrdquo version of FACOSgives worst result API relevance classifier may not be able toresolve the task well independently Partly this might be bedue to the limited amount of the training data (ie only 253training threads) In addition the ldquoDATYS+ onlyrdquo version ofFACOS performs much better compared to the ldquoAPI relevanceclassifier-onlyrdquo version However FACOS is still better thanboth of them It demonstrates that both components are usefuland essential

C RQ3 Effect of the weighting factor

Table V shows the performance of FACOS in the trainingset when we vary the values of weighting factor The boldnumbers in each row of the table are the average F1-scores ofthe chosen values of x in the training sets Similarly Table VIthe performance of FACOS in the test set when we vary thevalues of weighting factor The highest F1-score for boththe training and testing set is achieved when the value of

TABLE V Average Precision average Recall and averageF1-score of testing sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 03420 03658 0340801 04485 04653 0444102 08650 08925 0864103 08697 09016 0873004 08684 08934 0868905 08588 08723 0809706 08588 08723 0851007 08606 08723 0852108 08606 08723 0852109 08606 08723 0852110 08620 08723 08530

TABLE VI Average Precision average Recall and averageF1-score of training sets when weighting factor varies

x Avg Precision Avg Recall Avg F1-score0 06565 06685 0647301 07111 07272 0708002 08261 08506 0826903 08328 08498 0832904 08265 08410 0825405 08159 08234 0809706 08132 08234 0807907 08132 08234 0807908 08132 08234 0807909 08132 08234 0807910 08180 08191 08073

the weighting factor is equal to 03 It demonstrates that ourstrategy to pick the value of the weighting factor that leads tothe best performance in the training data works really well

VII DISCUSSION

A Cases where FACOS outperforms DATYS

(1) The relevant thread does not contain the type name of thegiven API method

Figure 4 shows the example of a case where the con-tent of the thread does not relate to the given APImethod The thread contains paragraph and code snip-pet of a Stack Overflow thread with ID 561353733orgmockitostubbingOngoingStubbingthenReturn4 is the APImethod the thread refers to

From the content of the thread it would be difficult tofind the relevance between the text written in the para-graphs and the given API method (ie orgmockitostubbingOngoingStubbingthenReturn) since the type (ie Ongo-ingStubbing) does not appear in the thread The text onlyshows the user view towards the code snippet without havinga description mentioning the application or usage of theobserved API method invocation (eg thenReturn in thecode snippet of Figure 4) Sentences such as rdquoThis works like

3httpsstackoverflowcomquestions561353734httpsjavadociodocorgmockitomockito-all202-betaorgmockito

stubbingOngoingStubbinghtml

7

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

Fig 4 Thread 56135373 on Stack Overflow where API isreferred by a code snippet of the thread

Fig 5 Thread 16919751 on Stack Overflow where API comgooglecommonbaseCharMatcheris is not referred by contentof the thread

charmrdquo do not provide much information to identify whetherthe observed API method refers to the given API

Therefore we leverage the content of the thread whichmight be relevant to the content of the API method For exam-ple in the thread above its title which is shown in Figure 6ardquoOptional cannot be returned by stream() in Mockito Testclassesrdquo relates to the comment of the given API which isSets a return value to be returned when the method is calledin Figure 6b Due to this feature FACOS can successfullyconsider this thread as relevant while DATYS missed it

(2) The irrelevant thread contains the type name of the givenAPI methodAn example of this case is shown in Fig-ure 5 In the thread5 the given API method iscomgooglecommonbaseCharMatcheris and there is a wordthat matches the simple name of the API method is whichwe highlighted Since the type of the given API method (egCharMatcher) appears in both the textual content and the codesnippet DATYS mistakenly accepts the thread as referring tothe given API By leveraging the semantic knowledge learntby the API relevance classifier FACOS is able to detect theirrelevance between the textual content code snippet aroundthe word is and the API comment and implementation codeFACOS can conclude that the thread is irrelevant to the givenAPI comgooglecommonbaseCharMatcheris

5httpsstackoverflowcomquestions16919751

(a)

(b)

Fig 6 The similarity in semantic meaning between the APIcomment of method orgmockitostubbingOngoingStubbingthenReturn in Figure 6a and the textual content (ie the title)of thread 56135373 in Figure 6b

B Case where FACOS fail to exclude irrelevant threads

Figure 7 shows a case where FACOS fail to exclude thethread6 out of the relevant results for the given API methodorgmockitoMockitomock The issue occurs when there is anAPI method that has a similar functionality as the given APImethod These two methods usually have the same simplename and highly similar functionality description

In Figure 7 PowerMock and Mockito perform similarfunctions such as mocking (ie creating a version of aservice in order to quickly and reliably run tests on thatservice7) Since both of them have the API method whosesimple name is mock and both their mock methods havethe same API signature (ie parameters return type) itwould be easy to mistakenly recognize one as the other evenfor a human Figure 8 shows the comment of API methodorgmockitoMockitomock8 which is Creates mock object ofgiven class or interface Because of the similarity betweenthe API method from Mockito library and the title of thread30127057 in Figure 7 FACOS wrongly recognizes that thesimple API name mock in the thread refers to the given APImethod orgmockitoMockitomock In fact the simple APIname mock refers to the one from PowerMock library

6httpsstackoverflowcomquestions301270577httpscirclecicombloghow-to-test-software-part-i-mocking-stubbing-

and-contract-testing8httpsjavadociostaticorgmockitomockito-all202-betaorgmockito

Mockitohtml

8

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

Fig 7 Thread 30127057 on Stack Overflow that FACOSfalsely recognize as referring to the API method orgmockitoMockitomock

Fig 8 The API comment of method orgmockitoMockitomock

C Adding Semantic Information for API Content Search

The API method and relevant information search does notalways utilize syntactic information For some libraries somemethods are chained (as in Figure 4) so the type is notdisplayed It requires a type system to determine its type Inaddition the Type system is not 100 percent reliable due toa lack of import information variables etc It is ineffectiveto use the similarity of word representations provided bylanguage models for API contents search API contents searchis unique to other tasks such as code search In code searchdue to the term rdquotransferrdquo and rdquoconvertrdquo being related tocurrency the method named rdquotransferSGDToUSDrdquo might bethe answer to the query rdquoconvert SGD to USDrdquo Howeversearching for an API method requires explicit mention of theAPI name in the query which could be difficult as differentAPIs may share the same API method names To solve theproblem solely using a model that only learns the syntacticmeaning of textual and code content is insufficient Thereforewe incorporated semantic information for API content searchwhich generated better results as we have demonstrated fromthe results of our experiment

D Threats to validity

A threat to internal validity is related to experiment bias Weobtain our dataset from another work We also run the baselineusing the code provided by the author We then check our codemultiple times to ensure that we do not make mistakes We

believe there should be little threats to internal validity Wealso release the dataset and the code for our experiments forall to use

A threat to external validity is on whether the approach isapplicable to other platform other than Stack Overflow Thisexperiment mainly focuses on Stack Overflow and Java pro-gramming language therefore it is uncertain whether FACOScan be applied on other discussion venues that also talk aboutAPI issues The potential platform might be Reddit which hassub-reddit (ie a place gathering Redditrsquos threads discussinga particular problem) discussing programming languages andframeworks It has title textual and code content as sameas Stack Overflow Thus the similarity suggests that we canpotentially apply FACOS to Reddit too We leave this possi-bility for a future work Regarding the threat that changingtargeting programming language would affect the accuracyof FACOS although we focus only on Java the features(eg API commentdocumentation API implementation codefully qualified name classtype name etc) required for theapproach can be found from other programming languagesTherefore we leave this as a future work to study whether itmay work well when applied to other programming languages

There is a also threat of construct validity on whetherprecision recall and F1-score is a suitable evaluation metricfor our task Our task is a classification task Many work insoftware engineering has used precision recall and F1-scoreas the evaluation metric for classification task [6] [14]ndash[16]Thus we believe the threats are minimal

VIII RELATED WORK

A API Disambiguation

Many past works [14] [17]ndash[23] deals with API disam-biguation There are two main groups informal text dis-ambiguation [14] [18]ndash[21] and code snippet disambigua-tion [17] [22] [23] As suggested by its name the first aims todisambiguate API mentions in textual content while the seconddeals with disambiguating API mention in code snippets

For informal text disambiguation several work utilizesclassical information retrieval approaches such as Vector SpaceModel and Latent Semantic Indexing to disambiguate the APImentions [18]ndash[20] while some others use heuristics [21]Bacchelli et al [20] combined string matching and informationretrieval algorithms to link emails to source code entitiesDagenais and Robillard [21] identified Java APIs mentioned insupport channels (eg mailing list forums) documents andcode snippets Ye et al [14] worked on API disambiguationin the textual content of Stack Overflow thread by utiliz-ing mention-mention similarity mention-entry similarity andscope filter Luong et al [6] used type scoping to disambiguatethe API mentions in Stack Overflow thread

The work on API disambiguation on Stack Overflow threadcan be viewed as another side of the coin of the task in findingthreads that are relevant to the API When we disambiguatean API in a thread the disambiguated API is relevant to thethread as the thread is talking about the API

9

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

B API Resource Retrieval

Several studies have explored how to search for the code forAPI and related information retrieval Lv et al [24] proposedCodehow to deal with the lack of query understanding abilityof the existing tool By expanding a user query with APIsCodehow can identify potential APIs and perform a codesearch based on the Extended Boolean model which considersthe impact of APIs on code search Gu et al [25] proposedDeepAPI to search for API usage sequences As opposed toassuming a bag of words it learns the sequence of wordswithin a query and the sequence of APIs associated withit DeepAPI encodes a single user query into a fixed-lengthcontext vector to generate an API sequence

Other studies have also exploited different aspects of APIsand natural language to better retrieve the APIs and theirrelated information The techniques include using global andlocal contexts of the queries [26] leveraging usage similar-ity for effective retrieval of API examples [27] employingword embeddings to document similarities for improved APIretrieval [28] exploiting user knowledge [29] and task-APIknowledge gap [30] during retrieval of semantically annotatedAPI operations

Wang et al [31] developed a transformer-based frameworkfor unifying code summarization and code search Shahbaziet al [32] proposed API2Com to improve automaticallygenerated code comments by fetching API documentationsAlhamzeh et al [33] built DistilBERT-based argumentationretrieval for answering comparative questions Dibia et al[34] and Vale et al [35] developed a usable library forquestion answering with contextual query expansion and aquestion-answering assistant for software development using atransformer-based language model respectively Ciniselli et al[36] performed an empirical study on the usage of TransformerModels for code search and completion

Our study also work on API resource retrieval Specificallywe retrieve Stack Overflow threads that are relevant to a targetAPI that we are searching for

C Contribution of StackOverflow for API documentation

Treude et al [37] studied the augmenting API documenta-tion with insights from stack overflow [4] explored the Crowddocumentation by examining the dynamics of API discussionson Stack Overflow whereas [38] dubbed the Stack Overflow asthe Social Media for Developer Support in terms of providedutilities [39] and [40] worked on classifying stack overflowposts on API issues and contextual documentation referencingon stack overflow The dichotomy of these studies is notablewhere some research like [41] and [42] studies how APIdocumentation fails via the API misuse on stack overflowother studies [43] [44] heavily lean on the Crowdsourcedknowledge on stack overflow for automated API documen-tation with tutorials Similarly crowdsourced knowledge washailed by [45] and [46] who explored the innovation diffusionand web resource recommendation for hyperlinks through linksharing on stack overflow

Our work support the effort in this line of study FACOScan automatically find threads about a particular API in StackOverflow that can be augmented to the corresponding APIdocumentation

D Word Sense and Entity Disambiguation Study

There are several works focused on disambiguationtask [21]ndash[23] [47] We also have found a variety of wordsense and entity disambiguation methods employed for differ-ent objectives [48]ndash[56] These studies have solved myriads ofproblems via solving lexical disambiguations in literature Thetask of word sense disambiguation is to identify a target wordrsquosintended meaning by examining its context Researchers haveused Word Sense Disambiguation to predict election resultsby enhanced sentiment analysis on Twitter data Researchershave associated place-name mentions in unstructured text withtheir actual references in geographic space using word dis-ambiguation Other research has also proposed unsupervisedknowledge-Free and interpretable Word Sense Disambigua-tion for various applications Researchers used this approach toadd meaning to social network posts when it comes to namedentity recognition and disambiguation Different Entity-fishingtools were also developed for facilitating the recognition anddisambiguation service In recent years tools that allow re-searchers to recognize and extract named entities have becomeincreasingly popular

IX CONCLUSION AND FUTURE WORK

We present FACOS an approach to search Stack Overflowthreads that refer to API of which users or tools may want tofind the usage We utilize the semantic and syntactic featuresof the paragraphs and code snippets in a thread to determinewhether the thread is related to a given API Our evaluationshows that FACOS has an improvement compared to DATYSwhen adapting both approaches to the search task We haveadded a weight parameter to balance the usage of syntactic andsemantic information for retrieving API mentions and relatedthreads We have proved the utility of the weight factor byincorporating an ablation study In future we plan to improveour approach with larger dataset which has more threads andAPIs Also we plan to make our approach become robust withmore programming languages so that it can be more useful todevelopers

Replication Package The source code for FACOS is availableat httpsanonymous4opensciencerfacos-E5C6

REFERENCES

[1] M P Robillard ldquoWhat makes apis hard to learn answers from devel-opersrdquo IEEE Software vol 26 no 6 pp 27ndash34 2009

[2] P K Venkatesh S Wang F Zhang Y Zou and A E Hassan ldquoWhatdo client developers concern when using web apis an empirical studyon developer forums and stack overflowrdquo in 2016 IEEE InternationalConference on Web Services (ICWS) IEEE 2016 pp 131ndash138

[3] M Linares-Vasquez G Bavota M Di Penta R Oliveto and D Poshy-vanyk ldquoHow do api changes trigger stack overflow discussions a studyon the android sdkrdquo in proceedings of the 22nd International Conferenceon Program Comprehension 2014 pp 83ndash94

10

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

[4] C Parnin C Treude L Grammel and M-A Storey ldquoCrowd docu-mentation Exploring the coverage and the dynamics of api discussionson stack overflowrdquo Georgia Institute of Technology Tech Rep vol 112012

[5] G Uddin and F Khomh ldquoAutomatic mining of opinions expressed aboutapis in stack overflowrdquo IEEE Transactions on Software Engineering2019

[6] K Luong F Thung and D Lo ldquoDisambiguating mentions of apimethods in stack overflow via type scopingrdquo in ICSME IEEE 2021

[7] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquoin ASE IEEE 2018

[8] M M Rahman C K Roy and D Lo ldquoRack Automatic api recommen-dation using crowdsourced knowledgerdquo in SANER 2016 vol 1 IEEE2016

[9] Z Feng D Guo D Tang N Duan X Feng M Gong L Shou B QinT Liu D Jiang et al ldquoCodebert A pre-trained model for programmingand natural languagesrdquo arXiv preprint arXiv200208155 2020

[10] J Devlin M-W Chang K Lee and K Toutanova ldquoBert Pre-trainingof deep bidirectional transformers for language understandingrdquo arXivpreprint arXiv181004805 2018

[11] A Vaswani N Shazeer N Parmar J Uszkoreit L Jones A N GomezŁ Kaiser and I Polosukhin ldquoAttention is all you needrdquo in Advancesin neural information processing systems 2017 pp 5998ndash6008

[12] K Clark M-T Luong Q V Le and C D Manning ldquoElectra Pre-training text encoders as discriminators rather than generatorsrdquo arXivpreprint arXiv200310555 2020

[13] Y Liu M Ott N Goyal J Du M Joshi D Chen O Levy M LewisL Zettlemoyer and V Stoyanov ldquoRoberta A robustly optimized bertpretraining approachrdquo arXiv preprint arXiv190711692 2019

[14] D Ye L Bao Z Xing and S-W Lin ldquoApireal an api recognitionand linking approach for online developer forumsrdquo Empirical SoftwareEngineering vol 23 no 6 pp 3129ndash3160 2018

[15] Q Huang E Shihab X Xia D Lo and S Li ldquoIdentifying self-admittedtechnical debt in open source projects using text miningrdquo EmpiricalSoftware Engineering vol 23 no 1 pp 418ndash451 2018

[16] G A A Prana C Treude F Thung T Atapattu and D Lo ldquoCategoriz-ing the content of github readme filesrdquo Empirical Software Engineeringvol 24 no 3 pp 1296ndash1327 2019

[17] C K Saifullah M Asaduzzaman and C K Roy ldquoLearning fromexamples to find fully qualified names of api elements in code snippetsrdquoin ASE IEEE 2019

[18] G Antoniol G Canfora G Casazza A De Lucia and E MerloldquoRecovering traceability links between code and documentationrdquo TSEvol 28 no 10 2002

[19] A Marcus and J I Maletic ldquoRecovering documentation-to-source-codetraceability links using latent semantic indexingrdquo in ICSE IEEE 2003

[20] A Bacchelli M Lanza and R Robbes ldquoLinking e-mails and sourcecode artifactsrdquo in ICSE 2010

[21] B Dagenais and M P Robillard ldquoRecovering traceability links betweenan api and its learning resourcesrdquo in 2012 34th international conferenceon software engineering (icse) IEEE 2012 pp 47ndash57

[22] S Subramanian L Inozemtseva and R Holmes ldquoLive api documenta-tionrdquo in Proceedings of the 36th International Conference on SoftwareEngineering 2014 pp 643ndash652

[23] H Phan H A Nguyen N M Tran L H Truong A T Nguyenand T N Nguyen ldquoStatistical learning of api fully qualified names incode snippets of online forumsrdquo in 2018 IEEEACM 40th InternationalConference on Software Engineering (ICSE) IEEE 2018 pp 632ndash642

[24] F Lv H Zhang J-g Lou S Wang D Zhang and J Zhao ldquoCodehowEffective code search based on api understanding and extended booleanmodel (e)rdquo in 2015 30th IEEEACM International Conference onAutomated Software Engineering (ASE) 2015 pp 260ndash270

[25] X Gu H Zhang D Zhang and S Kim ldquoDeep api learningrdquo inProceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE 2016 New YorkNY USA Association for Computing Machinery 2016 p 631ndash642[Online] Available httpsdoiorg10114529502902950334

[26] T Nguyen N Tran H Phan T Nguyen L Truong A T NguyenH A Nguyen and T N Nguyen ldquoComplementing global and localcontexts in representing api descriptions to improve api retrieval tasksrdquoin Proceedings of the 2018 26th ACM Joint Meeting on EuropeanSoftware Engineering Conference and Symposium on the Foundations

of Software Engineering ser ESECFSE 2018 New York NY USAAssociation for Computing Machinery 2018 p 551ndash562 [Online]Available httpsdoiorg10114532360243236036

[27] S K Bajracharya J Ossher and C V Lopes ldquoLeveraging usagesimilarity for effective retrieval of examples in code repositoriesrdquo inProceedings of the Eighteenth ACM SIGSOFT International Symposiumon Foundations of Software Engineering ser FSE rsquo10 New YorkNY USA Association for Computing Machinery 2010 p 157ndash166[Online] Available httpsdoiorg10114518822911882316

[28] X Ye H Shen X Ma R Bunescu and C Liu ldquoFrom wordembeddings to document similarities for improved information retrievalin software engineeringrdquo in Proceedings of the 38th InternationalConference on Software Engineering ser ICSE rsquo16 New YorkNY USA Association for Computing Machinery 2016 p 404ndash415[Online] Available httpsdoiorg10114528847812884862

[29] M Roj ldquoExploiting user knowledge during retrieval of semanticallyannotated api operationsrdquo in Proceedings of the Fourth Workshop onExploiting Semantic Annotations in Information Retrieval ser ESAIRrsquo11 New York NY USA Association for Computing Machinery 2011p 21ndash22 [Online] Available httpsdoiorg10114520647132064726

[30] Q Huang X Xia Z Xing D Lo and X Wang ldquoApi methodrecommendation without worrying about the task-api knowledge gaprdquo in2018 33rd IEEEACM International Conference on Automated SoftwareEngineering (ASE) 2018 pp 293ndash304

[31] W Wang Y Zhang Z Zeng and G Xu ldquoTransˆ 3 A transformer-based framework for unifying code summarization and code searchrdquoarXiv preprint arXiv200303238 2020

[32] R Shahbazi R Sharma and F H Fard ldquoApi2com On the improvementof automatically generated code comments using api documentationsrdquoarXiv preprint arXiv210310668 2021

[33] A Alhamzeh M Bouhaouel E Egyed-Zsigmond and J MitrovicldquoDistilbert-based argumentation retrieval for answering comparativequestionsrdquo Working Notes of CLEF 2021

[34] V Dibia ldquoNeuralqa A usable library for question answering (con-textual query expansion+ bert) on large datasetsrdquo arXiv preprintarXiv200715211 2020

[35] L d N Vale and M d A Maia ldquoTowards a question answering assistantfor software development using a transformer-based language modelrdquoarXiv preprint arXiv210309423 2021

[36] M Ciniselli N Cooper L Pascarella A Mastropaolo E AghajaniD Poshyvanyk M Di Penta and G Bavota ldquoAn empirical study onthe usage of transformer models for code completionrdquo arXiv preprintarXiv210801585 2021

[37] C Treude and M P Robillard ldquoAugmenting api documentation withinsights from stack overflowrdquo in 2016 IEEEACM 38th InternationalConference on Software Engineering (ICSE) IEEE 2016 pp 392ndash403

[38] M Squire ldquordquo should we move to stack overflowrdquo measuring the utilityof social media for developer supportrdquo in 2015 IEEEACM 37th IEEEInternational Conference on Software Engineering vol 2 IEEE 2015pp 219ndash228

[39] M Ahasanuzzaman M Asaduzzaman C K Roy and K A SchneiderldquoClassifying stack overflow posts on api issuesrdquo in 2018 IEEE 25th in-ternational conference on software analysis evolution and reengineering(SANER) IEEE 2018 pp 244ndash254

[40] S Baltes C Treude and M P Robillard ldquoContextual documentationreferencing on stack overflowrdquo IEEE Transactions on Software Engi-neering 2020

[41] G Uddin and M P Robillard ldquoHow api documentation failsrdquo Ieeesoftware vol 32 no 4 pp 68ndash75 2015

[42] T Zhang G Upadhyaya A Reinhardt H Rajan and M Kim ldquoArecode examples on an online qampa forum reliable a study of api misuseon stack overflowrdquo in 2018 IEEEACM 40th International Conferenceon Software Engineering (ICSE) IEEE 2018 pp 886ndash896

[43] S Meldrum S A Licorish and B T R Savarimuthu ldquoCrowdsourcedknowledge on stack overflow A systematic mapping studyrdquo in Proceed-ings of the 21st International Conference on Evaluation and Assessmentin Software Engineering 2017 pp 180ndash185

[44] A M Rocha and M A Maia ldquoAutomated api documentation withtutorials generated from stack overflowrdquo in Proceedings of the 30thBrazilian Symposium on Software Engineering 2016 pp 33ndash42

[45] C Gomez B Cleary and L Singer ldquoA study of innovation diffusionthrough link sharing on stack overflowrdquo in 2013 10th Working Confer-ence on Mining Software Repositories (MSR) IEEE 2013 pp 81ndash84

11

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References

[46] J Li Z Xing D Ye and X Zhao ldquoFrom discussion to wisdomweb resource recommendation for hyperlinks in stack overflowrdquo inProceedings of the 31st Annual ACM Symposium on Applied Computing2016 pp 1127ndash1133

[47] A T Nguyen P C Rigby T Nguyen D Palani M Karanfil and T NNguyen ldquoStatistical translation of english texts to api code templatesrdquoin 2018 IEEE International Conference on Software Maintenance andEvolution (ICSME) IEEE 2018 pp 194ndash205

[48] T Steiner R Verborgh J Gabarro Valles and R Van de WalleldquoAdding meaning to social network microposts via multiple named entitydisambiguation apis and tracking their data provenancerdquo InternationalJournal of Computer Information Systems and Industrial Managementvol 5 pp 69ndash78 2013

[49] M Karimzadeh W Huang S Banerjee J O Wallgrun F HardistyS Pezanowski P Mitra and A M MacEachren ldquoGeotxt A web api toleverage place references in textrdquo in Proceedings of the 7th Workshopon Geographic Information Retrieval ser GIR rsquo13 New York NYUSA Association for Computing Machinery 2013 p 72ndash73 [Online]Available httpsdoiorg10114525338882533942

[50] S Patwardhan S Banerjee and T Pedersen ldquoSenserelate Targetword-a generalized framework for word sense disambiguationrdquo in ACL vol2005 2005 pp 73ndash76

[51] R Jose and V S Chooralil ldquoPrediction of election result by enhanced

sentiment analysis on twitter data using word sense disambiguationrdquo in2015 International Conference on Control Communication ComputingIndia (ICCC) 2015 pp 638ndash641

[52] L Foppiano and L Romary ldquoentity-fishing a dariah entity recognitionand disambiguation servicerdquo Journal of the Japanese Association forDigital Humanities vol 5 no 1 pp 22ndash60 2020

[53] S Zwicklbauer C Seifert and M Granitzer ldquoDo we need entity-centricknowledge bases for entity disambiguationrdquo in Proceedings of the 13thInternational Conference on Knowledge Management and KnowledgeTechnologies 2013 pp 1ndash8

[54] A Mandalios K Tzamaloukas A Chortaras and G Stamou ldquoGeekIncremental graph-based entity disambiguationrdquo in LDOW WWW2018

[55] D Klein K Toutanova H T Ilhan S D Kamvar and C D ManningldquoCombining heterogeneous classifiers for word sense disambiguationrdquoin Proceedings of the ACL-02 workshop on Word sense disambiguationrecent successes and future directions 2002 pp 74ndash80

[56] P Chen W Ding C Bowes and D Brown ldquoA fully unsupervisedword sense disambiguation method using dependency knowledgerdquoin Proceedings of Human Language Technologies The 2009 AnnualConference of the North American Chapter of the Association forComputational Linguistics 2009 pp 28ndash36

12

  • I Introduction
  • II Preliminaries
    • II-A DATYS
    • II-B CodeBERT
      • III Approach Overview
        • III-A Task Definition
        • III-B Architecture
          • IV FACOS
            • IV-A DATYS+
            • IV-B API relevance embedding
            • IV-C API relevance classifier
            • IV-D Computing joint relevance score
              • V Experiment
                • V-A Dataset and Experimental Settings
                • V-B Metrics
                • V-C Research Questions
                  • VI Result
                    • VI-A RQ1 FACOS Effectiveness
                    • VI-B RQ2 Ablation Study
                    • VI-C RQ3 Effect of the weighting factor
                      • VII Discussion
                        • VII-A Cases where FACOS outperforms DATYS
                        • VII-B Case where FACOS fail to exclude irrelevant threads
                        • VII-C Adding Semantic Information for API Content Search
                        • VII-D Threats to validity
                          • VIII Related Work
                            • VIII-A API Disambiguation
                            • VIII-B API Resource Retrieval
                            • VIII-C Contribution of StackOverflow for API documentation
                            • VIII-D Word Sense and Entity Disambiguation Study
                              • IX Conclusion and Future Work
                              • References