GSoC 2020 (GitHub Importer Project)

dernDren1611 · March 22, 2020, 6:18am

Hello Everybody,
I am Prastik, a GSoC '20 aspirant. I found the “GitHub Importer” project quite fascinating and am working towards writing a proposal for the same. Having started with the problem statement quite recently I have the following doubts. (Some of the questions are quite obvious but just want to make sure they are valid before mentioning them in the proposal.)

Understanding of an Importer: In the Confluence Importer Project, we export an article from Confluence as an XML and then import into the XWiki instance and later convert it into XWiki compatible form using a Filter Converter. So here, is the importer the filter extension that parses the Confluence XML before XWiki conversion? By GitHub Importer, does it mean a parser(filter) used to read/stream a GitHub Markdown before conversion to Wiki?
I see something closely similar to a GitHub importer here. How would the new importer be different from this script. As from this thread, do we instead need an extension rather than just a script. Fact Check: Would the new importer function closely the same as this script already written? Or in what way would the desired importer be different from the existing script?
Since for Confluence or DokuWiki, we can already export their files into XML or text. But for GitHub pages or GitHub Wiki, do we directly take the existing .md file as input and pass into the filter? Then, we won’t have the export functionality instead we directly import using API calls in the new Importer, is this the case?

Please help correct me if I might have misunderstood any of the workings.
Thank You!

Gentle Ping: @vmassol @bartkummel
Thank You!

vmassol · March 22, 2020, 11:02am

Hi Prastik. Welcome to XWiki and GSOC 2020!

No, the goal of the importer is to import github pages, not confluence pages.

Yes it means being able to import github pages written in GitHub -flavored Markdown and convert them to XWiki Syntax 2.1.

Very close. It’s about making this snippet production ready and using the Filter Stream architecture.

Yes, correct.

The new extension would be like the Mediawiki or Confluence extensions but for GitHub pages.

AFAIK there’s no REST API to export GitHub pages or to access their content, so the solution will be to “git clone” the repo locally by using the XWiki Git module (GitHub - xwiki-contrib/api-git: API to execute Git commands inside XWiki), which internally uses Eclipse JGit, clone the repo to the xwiki permanent directory (as it’s done by theXWiki GitHub Stats Application, see https://extensions.xwiki.org/xwiki/bin/view/Extension/GitHub%20Application and then access the pages as files from the local filesystem using the JGit API.

Now, this is just the idea I have FTM, we can certainly improve it if you have other ideas.

Hope it helps

dernDren1611 · March 22, 2020, 11:41am

Thank You So much @vmassol . I’ll update the thread if I have more questions.

bartkummel · March 22, 2020, 11:59am

Hi @dernDren1611,

Great that you’re looking into making my script production ready as a proper importer. I agree with the answers of @vmassol. Please, also take a look at the known issues of the script. I think you will encounter at least some of those issues too.

Best regards,
Bart Kummel

dernDren1611 · March 22, 2020, 12:10pm

Thank You @bartkummel
I’ll get back to you on this!