# 1 The PDF document

The BigData Analytics example is a technical multi page document based on a single page layout. The document includes not only paragraphs but lists, tables, images and page numbering.

The example below shows the the original versus the generated one (header with red line and page numbering with total pages).

Original Generated

# 2 The source code

The following script code shows how to generate the document.

Copy
<script>
var PAGE_EVEN             = "Even";
var COLOR_FONT_BLUE       = "#1A2E62";
var COLOR_TABLE_ROW_BLUE  = "#C9D7E9";
var COLOR_TABLE_LINE_BLUE = "#6E93C5";
var COLOR_RED             = "red";
var FILE_PATH			  = "https://bitbucket.org/deister/axional-docs-resources/raw/master/FOP/BigDataAnalytics/";

// ====================================================================
// Create document layout
// ====================================================================
var root = new Ax.fop.DocumentBuilder()
// margin: left, right, top, bottom
.build();

// ====================================================================
// setup margins
// ====================================================================
root.getSimplePageMaster(PAGE_EVEN).setExtentTop(1.0);
root.getSimplePageMaster(PAGE_EVEN).setExtentTop(1.0);
root.getSimplePageMaster(PAGE_EVEN).setExtentTop(1.0);
root.getSimplePageMaster(PAGE_EVEN).setExtentTop(1.0);

// ====================================================================
// ====================================================================
root.getPageSequence(PAGE_EVEN).setInitialPageNumber(1);
root.getStaticContentBefore(PAGE_EVEN).getWrapper()
.setFontSize(9)
.setFontFamily("Candara")
.setTextAlign("center")
.setFontWeight("200")
.setColor(COLOR_FONT_BLUE)
.setBorderBottomWidth(.5)
.setBorderBottomStyle("solid")
.setBorderBottomColor(COLOR_RED);

// ====================================================================
// Footer static content: page number
// ====================================================================
var pagenumber = root.getStaticContentAfter(PAGE_EVEN).getWrapper()
.setFontSize(8)
.setFontFamily("Candara")
.setTextAlign("end")
.setFontWeight("100")
.setColor(COLOR_FONT_BLUE)
.setBorderTopWidth(.5)
.setBorderTopStyle("solid")
.setBorderTopColor(COLOR_RED)

.putPageNumber()
.putPageNumberCitation(PAGE_EVEN)
;

// ====================================================================
// Set default font/color for all document
// ====================================================================
var wrapper =
root.getBodyFlow(PAGE_EVEN)
.getWrapper()
.setFontFamily("Candara")
.setFontSize(9)
.setColor(COLOR_FONT_BLUE)
;

// ====================================================================
// Create standard paragraph format
// ====================================================================
var para       = root.createBlockProperties().setSpaceBefore(10, "pt").setTextAlign("justify").setLineHeight("12.5pt");
var center     = root.createBlockProperties().setTextAlign("center");
var sectionL1  = root.createBlockProperties().setSpaceBefore(10, "pt").setFontSize(16).setFontWeight("bold").setKeepWithNext("always");
var sectionL2  = root.createBlockProperties().setSpaceBefore(10, "pt").setFontSize(10).setFontWeight("bold").setKeepWithNext("always").setColor("cornflowerblue");
var tableTitle = root.createBlockProperties().setSpaceBefore(9, "pt").setFontSize(10).setTextAlign("center");

wrapper.addBlock("Big Data Analytics = Machine Learning + Cloud Computing").setSpaceBefore(20, "pt").setFontSize(18).setFontWeight("bold").setTextAlign("center");

// ====================================================================
// SECTION 1.1
// ====================================================================

wrapper.addBlock(para, "Although the term Big Data has become popular, there is no general consensus about what it really means. Often, many professional data analysts would imply the process of Extraction, Transformation and Load (ETL) for large datasets as the connotation of Big Data. A popular description of Big Data is based on three attributes of data: volume, velocity, and variety (or 3Vs). Nevertheless, it does not capture all the aspects of Big Data accurately. In order to provide a comprehensive meaning of Big Data, we will investigate this term from a historical perspective and see how it has been evolving from yesterday’s meaning to today’s connotation.");
wrapper.addBlock(para, "Historically, the term Big Data is quite vague and ill-defined. It is not a precise term and does not carry a particular meaning rather than the notion of its size. The word “Big” is too generic. The question how “big” is big and how “small” is small [1] is relative to time, space and a circumstance. From an evolutionary perspective, the size of “Big Data” is always evolving. If we use the current global Internet traffic capacity [2] as a measuring stick yard, the meaning of Big Data’s volume would lie between Terabyte (TB or 1012 or 240) and Zettabyte (ZB or 1021 or 270) range. Based on historical data traffic growth rate, Cisco claimed that human has entered the ZB era in 2015 [2]. To understand significance of the data volume’s impact, let us glance at the average size of different data files shown in Table 1.");

// ====================================================================
// Data table
// ====================================================================
var table = wrapper
.setSpaceBefore(10, "pt")
.setSpaceAfter(10, "pt")
.setBorderBottomColor(COLOR_TABLE_LINE_BLUE)
.setBorderBottomStyle("solid")
;

.setBorderTopColor(COLOR_TABLE_LINE_BLUE)
.setBorderTopStyle("solid")
.setBorderBottomColor(COLOR_TABLE_LINE_BLUE)
.setBorderBottomStyle("solid")
;

table.getBody().setStrippedColor(COLOR_TABLE_ROW_BLUE);

table.getBody().addRow([ "Web Page", "1.6 - 2 MB",   "Ave 100 objects" ]);
table.getBody().addRow([ "eBook",    "1 - 5 MB",     "200-350 pages" ]);
table.getBody().addRow([ "Song",     "3.5 - 5.8 MB", "Ave 1.9 MB/per minute(MP3) 256 Kbps rate (3 mins)" ]);
table.getBody().addRow([ "Movie",    "1.6 - 2 MB",   "60 frames per second (MPEG-4 format, Full High Definition, 2 hours)" ]);

// ====================================================================
//
// TODO: align
// TODO: strip rows
//
// ====================================================================

for (var row of table.getHeader().getRows()) {
row.forEach(cell => {
});
}

for (var row of table.getBody().getRows()) {
row.setHeight(14.0, "pt");
row.forEach(cell => {
cell.setDisplayAlign("center");
});
if (!row.isLast())
row.setBorderBottomColor(COLOR_TABLE_LINE_GRAY).setBorderBottomStyle("solid");
}

// ====================================================================
// Table footer
// ====================================================================
wrapper.addBlock(tableTitle, "Table 1: Typical Size of Different Data Files");

wrapper.addBlock(para, "The main aim of this chapter is to provide a historical view of Big Data and to argue that Big Data is not just 3Vs, but rather 32Vs or 9Vs. These additional Big Data attributes reflect the real motivation behind Big Data Analytics (BDA). We believe that these expanded features clarify some basic questions about the essence of BDA: what problems Big Data can address, and what problems should not be confused as BDA. These issues are covered in the chapter through analysis of historical developments along with associated technologies that support Big Data processing. The rest of the chapter is organised into eight sections as follows:");

// ====================================================================
// Data list
// ====================================================================
[
"historical Review for Big Data",
"Interpretation of Big Data 3Vs, 4Vs and 6Vs",
"Defining Big Data from 3Vs to 32Vs",
"Big Data and Machine Learning",
"Big Data and Cloud Computing",
"ML + CC " + "<font family='Symbol'>\u2192</font>" + " BDA and Guidelines",
"Conclusion"
]
).setSpaceBefore(15, "pt");

// ====================================================================
// SECTION 1.2
// ====================================================================
wrapper.addBlock(sectionL1, "1.2 A Historical Review of Big Data");
wrapper.addBlock(para, "In order to capture the essence of Big Data, we provide the origin and history of BDA and then propose a precise definition of BDA.");

// ====================================================================
// SECTION 1.2.1
// ====================================================================
wrapper.addBlock(sectionL2, "1.2.1 The Origin of Big Data");

wrapper.addBlock(para, "Several studies have been conducted on historical views and developments in BDA area. Gil Press [3] provided a short history of Big Data starting from 1944, which was based on Rider’s work [4]. He covered 68 years of history of evolution of Big Data between 1944 and 2012 and illustrated 32 Big Data related events in the recent data science history. As Press’ indicated in his article, the fine line between the growth of data and Big Data has become blurred. Very often, the growth rate of data has been referred as “information explosion” (although “data” and “information” are often used interchangeably, two terms have different connotations). Press’ study is quite comprehensive and covers BDA events up to December 2013. Since then, there have been many relevant Big Data events. Nevertheless, Press’ review did cover both Big Data and Data Science events. To this extent, the term Data Science could be considered as a complementary meaning of BDA.");
wrapper.addBlock(para, "In comparison with Press’ review, Frank Ohlhorst [5] established the origin of Big Data to 1880 when the 1oth US census was held. The real problem during the 19th century was a statistics issue, which was how to survey and document 50 million of North-American citizens. Although Big Data may contain computation of some statistics elements, these two terms have different interpretations today. Similarly, Winshuttle [6] believe the origin of Big Data was in the 19th century. They argue if data sets are so large and so complex and beyond traditional process and management capability, then these data sets can be considered as “Big Data”. In comparison to Press’, Winshuttle’s review emphasizes Enterprise Resource Planning (ERP) and implementation on cloud infrastructure. Moreover, the review also makes a predication for data growth to 2020. The total time span of its review was more than 220 years. Winshuttle’s Big Data history included many SAP events and its data products, such as HANA.");
wrapper.addBlock(para, "The longest span of historical review for Big Data belongs to Bernard Marr’s description [7]. He traced the origin of Big Data back to 18,000 BCE. Marr argued that we should pay attention to historical foundations of Big Data, which are different approaches for human to capture, store, analyze and retrieve both data and information. Furthermore, Marr believed that the first person who casted the term “Big Data” was Erik Larson [9], who presented an article for Harper’s Magazine and it was subsequently reprinted in The Washington Post in 1989 because there were two sentences that consisted of the words of Big Data: “The keepers of Big Data say they do it for the consumer’s benefit. But data have a way of being used for purposes other than originally intended.”");
wrapper.addBlock(para, "In contrast, Steve Lohr [10] disagrees with Marr’s view. He argues that just adopting the term alone might not have the today’s Big Data connotation because “The term Big Data is so generic that the hunt for its origin was not just an effort to find an early reference to those two words being used together”. Instead, the goal was the early use of the term that suggests its present interpretation — that is, not just a lot of data, but different types of data handled in new ways”. This is an important point. Based on this reasoning, we consider that Cox and Ellsworth [8] as the origin of Big Data because they assigned a relatively accurate meaning to the existing view of Big Data, which they stated “...data sets are generally quite large, taxing the capacities of main memory, local disk and even remote disk. We call this the problem of Big Data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk...”. Although today’s term may have extended meaning than Cox and Ellsworth’s term, this definition reasonably accurately reflects today’s connotation.");
wrapper.addBlock(para, "Another historical review was contributed by Visualizing .org [11]. It focused on the timeline of how to implement BDA. Its historical description is mainly determined by events related to Big Data push by many Internet and IT companies, such as Google, Youtube, Yahoo, Facebook, Twitter and Apple. Especially, it emphasized the significant impact of Hadoop in the history of BDA. It primarily highlighted the significant role of Hadoop in the BDA. Based on these studies, we show the history of Big Data, Hadoop and its ecosystem in Figure 1.");

if (FILE_PATH.startsWith("http"))
else
wrapper.addBlock(tableTitle, "Figure 1 A Short History of Big Data");
wrapper.addBlock(para, "Undoubtedly, there will be many different views based on different interpretations of BDA. This will inevitably lead to many debates of Big Data implication or pros and cons.");

// ====================================================================
// SECTION 1.2.2
// ====================================================================
wrapper.addBlock(sectionL2, "Debates of Big Data Implication");
wrapper.addBlock(para, "There have been many debates regarding Big Data’ implication during the past few years. Many advocates declare Big Data as a new rock star [20] and Big Data will be the next frontier [21], [22] for innovation, competition and productivity because data is embedded in the modern human being’s life. Data that are generated by both machines and human in every second is a by-product of all other activities. It will become even the new epistemologies [23] in science. To certain degree, Mayer and Cukier [24] argued Big Data would revolutionize our way of thinking, working and living. They believe that a massive quantitive data accumulation will lead to qualitative advances at the core of BDA - machine learning, parallelism, metadata and predictions. “Big Data will be a source of new economic value and innovation”. Their conclusion is that data can speak for itself and we should let the data speak.");
wrapper.addBlock(para, "To certain extent, Montjoye et al’s [25] echoed the above conclusion. They demonstrated that it is highly probable (over 90% reliability) to re-identify a person with as little as only four spatiotemporal data points (credit card transactions in a shopping mall) by leveraging Big Data Analytics. Their conclusion is that “large scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities and perform research.”");
wrapper.addBlock(para, "In contrast, some argue that Big Data is inconclusive, overstated, exaggerated and misinformed by the media and data cannot speak for itself [12]. It does not matter how Big Dataset is. It could be just another delusion because “it is like having billions of monkeys typing, one of them will write Shakespeare.” [13]. In Dobelli’s term [14], we should “never judge a decision by its outcome – outcome bias”. In other words, if one of the monkeys can type Shakespeare, we cannot conclude or inference that a monkey has sufficient intelligence to be Shakespeare.");
wrapper.addBlock(para, "Gary Drenik [15] believed that the sentiment of the overeager adoption of Big Data is more like “Extraordinary Popular Delusion and the Madness of Crowds”, the description made by Charles Mackay [16] on his famous book’s title. Psychologically, it is a kind of a crowd emotion that seems to have a perpetual feedback loop. Drenik quoted this “madness” with Mackay’s warning: “We find that whole communities suddenly fix their minds upon one subject, and go mad in its pursuit; that millions of people become simultaneously impressed with one delusion, and run it till their attention is caught by some new folly more captivating than the first.”. The issue that Drenik has noticed “the hype overtaken reality and there was little time to think about” regarding Big Data. The former Obama’s campaign CTO, Harper Reed, has the real story in terms of adoption of BDA. His remarks of Big Data were “literally hard” and “expensive” [34].");
wrapper.addBlock(para, "Danah Boyd et al [17] are quite sceptical regarding Big Data in term of its volume. They argued bigger data are not always better data from social science perspective. In responding to “The End of Theory” [18] proposition, Boyd asserted that theory or methodology is still highly relevant for today’s statistical inference and “The size of data should fit the research question being asked; in some cases, small is best”. They suggested that we should not pay a lot of attention to the volume of data. Philosophically, the critic is similar as the debate between John Stuart Mill (five Mill’s classical or empirical methods) and his critics [35] in 19th century, which Mill’s critics argued that it is impossible to bear on the intelligent question by just ingesting as much as data alone without some theory or hypothesis. This means that we cannot make Big Data do the work of theory.");
wrapper.addBlock(para, "Another Big Data critique comes from David Lazer et al. [19]. They demonstrated that Google Flu Trends (GFT) prediction is the parable and identified two issues (Big Data hubris and algorithm dynamics) that contributed to GFT’s mistakes. The issue of “Big Data hubris” is that some observers believe that BDA can replace traditional data mining completely. The issue of “algorithm dynamics” is “the changes made by (Google’s) engineers to improve the commercial service and by consumers in using that service”. In another words, the changing algorithms for searching will directly impact on the users’ behavior. This will lead to the collected data is driven by deliberated algorithms. Lazer concluded there are many traps in BDA, especially for social media research. Their conclusion was “we are far from a place where they (BDA) can supplant more traditional methods or theories.”");
wrapper.addBlock(para, "All these multiple views were due to different interpretations of Big Data and different implementations of BDA. This suggests that in order to resolve these issues, we should first clarify the definition of the term BDA and then discover the clash point based on the same term.");

// ====================================================================
// SECTION 1.3
// ====================================================================
wrapper.addBlock(sectionL1, "1.3 Historical Interpretation of Big Data");
wrapper.addBlock(sectionL2, "1.3.1 Methodology for Defining Big Data");
wrapper.addBlock(para, "Intuitively, neither yesterday’s data volume (absolute size) nor today’s one can be defined as “Big”. Moreover, today’s “Big” may become tomorrow’s “small”. In order to clarify the term Big Data precisely and settle down the debate we can investigate and understand the functions of a definition based on the combination of Robert Baird’s [26] and Irving Copi’s [27] approaches (see Figure 2).");
if (FILE_PATH.startsWith("http"))
else
</script>