December 21, 2013

Search Strategies

Twenty years ago, Marchionini [15] found that while using a CD-ROM encyclopedia to search, children had difficulty formulating search terms and frequently used
natural language or phrases instead of keywords. The younger children in the study, 8-10 year olds, were much more likely than the 11-12 year old group to use phrases or sentences, leading to unsuccessful searches. Large et al. [13] found that while using multimedia CDROM programs, 11 and 12-year old children preferred to browse for information rather than to search. Schacter etal. [19] suggest that lack of planning when attempting a
complex search and a desire for the easiest path to the desired information lead children to prefer browsing to keyword search. This finding is consistent with the observation that novice adult users tend to prefer strategies that require less cognitive load [15]. Bilal [3, 4] also found that when looking for information on Yahooligans, a web directory, 12 and 13 year old
children were better at finding information by browsing than by searching, and that they browsed much more than they searched. Despite this, the children still preferred to
use keywords to search. Thus, while children might use sub-optimal search strategies and fail to find the information they need, they still may want to use keyword search interfaces.

RELATED RESEARCH

The research on children’s search interfaces includes three areas of HCI and information science research: (1) search strategies; (2) typing and spelling; (3) deciphering results. These three aspects also identify the areas of considerable challenge for children. In the sections that follow, we discuss research and challenges in each of these areas.

The Need for Research

Browsing the web exposes children to vast numbers of websites on every imaginable topic and in many different media formats (e.g. web sites, documents, videos, images). Using a search engine to find information from large numbers of disparate web pages is very different
from searching the finite and pre-determined content found in the CD-ROM applications, online digital libraries, and web directories of the past. However, most of what we know about how children search is based on studies using these kinds of sources. Today’s search engines are not only more expansive than past technologies, but more ubiquitous in a child’s world. It is not uncommon for young people to begin to go online with a parent or sibling by the age of three or four. They move from their home computers, to their schools’
computing facilities, to their mobile phones, searching for online games, information for school assignments, and random facts they are curious about because of the world around them. Today’s children are the first generation of what are being called “digital natives” [1]. As a result, it is important to explore how children search the Internet with today’s ubiquitous keyword interfaces. With the knowledge gained from such studies, we hope to design and test new interfaces and algorithms to better support the needs of children.

Author Keywords

Children, Internet, search, search engine, query formulation, typing, search results
ACM Classification Keywords H.3.3. Information Search and Retrieval; H.5.2 User
Interfaces: Graphical user interfaces (GUI) and Usercentered design.
INTRODUCTION
The leading activity for all age groups on the Internet is general exploration: activities such as searching, surfing, and reading about interests, sports, and movies [6]. Recent studies in the U.S. have shown that 74% of children ages 8-18 years have access to the Internet [18]. Children make up one of the largest groups of users of computers and the Internet [17]. Despite childrens’ frequent use of the Internet and exposure to technology at an early age, when asked ‘what frustrates you most about searching on the Internet’, several child participants in our study provided some revealing answers. Child (age 7): “Writing words is hard for me because I'm not really good at the writing.” Child (age 9): “It doesn’t do all the words you say.” Child (age 11): “It's hard because you have to find the
right words to put in the box."
These challenges were a just a few of several we saw when conducting our initial study on how children search the Internet. When, where, what, and how they search were prominent concerns as we interviewed the children and parents who participated. This paper describes our methods, highlights our findings, and offers considerations for the design of future Internet search interfaces for children.

ABSTRACT

Children are among the most frequent users of the Internet, yet searching and browsing the web can present many challenges. Studies over the past two decades on how children search were conducted with finite and predetermined content found in CD-ROM applications,
online digital libraries, and web directories. However, with the current popularity of the open Internet and keyword-based interfaces for searching it, more critical analysis of the challenges children face today is needed. This paper presents the findings of our initial study to
understand how children ages 7, 9, and 11 search the Internet using keyword interfaces in the home. Our research has revealed that although today’s children have been exposed to computers for most of their lives, spelling, typing, query formulation, and deciphering results are all still potential barriers to finding the information they need.

December 18, 2013

Conclusions

We have presented a framework and an implemented system for browsing and
analyzing sets of documents which are annotated with category keyword labels.
The system might be used as a support tool for domain experts that need to analyze
and summarize large document sets. It may also be used in the regular query-andbrowse
cycle of a document retrieval session, to support the browsing phase.
Currently, when users face the common response of the type “1000 documents
match your query”, they need to guess in advance how they might restrict their query.
In such cases the KDT system could provide much help in figuring out the
content of these 1000 documents, and narrowing down the sets of target
documents. The KDT system is based on a compact model, which relies on rather modest assumptions. It requires annotation of documents with category keywords which are organized in a simple hierarchy. It also demonstrates the rich variety of KDD
operations that can be based on keyword co-occurrence distributions and their
comparison with the relative entropy distance measure. The simplicity of the
model makes it rather easy to implement, and the pre-computation of keyword cooccurrence distributions makes online computations very efficient. In future work we plan to extend the KDT framework to work also on cooccurrence distributions of terms and
groups of terms that were extracted directly from the texts. This way we hope to
combine these two levels of representation, namely category labels and document
terms, in analogy to the way they are often combined in retrieval queries.

Finding Trends Over Time

One of the most important needs of an analyst is the ability to follow changes
over time in the behavior of entities of interest. For example, a trend analysis tool
should be able to compare the activities that a company performed in some domain
in the past with its current activities in that domain. For example, a possible
conclusion from such an analysis would be that a company is shifting its activities from
one domain to another. The KDT system identifies trends by comparing a distribution of data taken from one period of time to a corresponding model distribution which is constructed from data of another period. Trends are then
discovered by searching for significant deviations from the expected model, as
before. Figure 6 lists trends that were identified across the different quarters of
the year. The program was directed to search for significant changes in the cooccurrence
distributions of Arab League countries with any other country. For example, the first line of the top listbox shows that in the 3rd quarter there was a large increase in the proportion of articles that mention both Libya and Chad among all articles mentioning Libya (from 0% in the 2nd quarter to 35.29% in the 3rd quarter). The second line shows that the proportion of such articles in the 3rd quarter was also
much higher than in the fourth quarter (a decrease over time, again to 0%). An
analyst might then want to investigate what happened in the 3rd quarter regarding Libya
and Chad. To facilitate such an investigation, the system provides access to
the specific articles that support the trend, by double clicking on the appropriate line.
Then, a listbox containing all titles of the relevant documents appears, as in Figure J,
revealing that the cause for the trend was the fighting between Libya and Chad at that
period.
Figure I - Trends in co-occurrence of Arab League countries with other countries. The
distance is measured from the period (quarter) listed in the second column (P1) and the period in the third column (P2), where each line corresponds to a large contribution to this distance. The last five columns are as in previous figures.

Finally, the user can request a graphical representation of co-occurrence frequencies
of any 2 categories, in a desired level ofgranularity of time segments. Figure K
displays the percentage of articles annotated with the category crude within the average
topic distribution of OPEC countries, across different quarters.

Figure K - Crude proportion of the topic distribution of OPEC across the year quarters


Previous                                    Next

Specific comparisons

The mechanism for identifying strong associations relative to a model is also useful
for comparing conditional distributions oftwo specific nodes in the hierarchy. In Figure
G we measure the distance from the average topic distribution of Arab League countries
to the average topic distribution of G7  countries. This reveals the topics with which
Arab League countries are associated much more than G7 countries, like crude-oil and
wheat. Figure H shows the comparison in the opposite direction, revealing the topics with
which G7 countries are highly associated relative to the Arab League.
Figure G - Topics Profile Comparison of the Arab League countries vs. the G7 countries.
Entries in the top listbox are sorted in decreasing order of their contribution to the relative entropy distance (2nd column). The 3rd and 5th columns show, respectively, the percentage of the topic in the average topic distribution of the Arab League countries and in the average topic distribution of the G7 countries. The 4th and 6th columns show, respectively, the total number of articles in which the topic appears with any Arab League country and any G7 country. Figure H - Topics Profile Comparison of the G7 countries vs. the Arab League countries. The columns in the upper listbox are the same as in Figure G.


Previous                                     Next

General associations

Another form of association can be defined by taking as the baseline model the average
distribution of the conditioned category over all possible instantiations of the conditioning
category (in the formulation of the previous sub-section, x would range over all categories of the same type, rather than over all immediate siblings). This form is demonstrated in Figure E, which lists the strongest associations found between some
country and some topic. The system also enables the user to investigate further the
subset of documents which corresponds to a certain association. In Figure E we chose to
explore the set of documents corresponding to the association between South Korea and
trade, presenting the distribution of countries within this set (lower-right listbox, specified
by the “Expand Category” pull-down menu). This reveals which countries are most
prominent in articles dealing with both South Korea and trade, conveniently linking the
browsing mechanism of Figure C to the association display screen.
In many cases, the system generates a very large number of associations, making it
difficult to draw overall conclusions. To summarize the information, the system
groups together correlations whose second component belongs to the same class in the
hierarchy. Figure F shows the clusters that were formed by the system when grouping all
the individual associations of Figure E. For example, in 43 associations of Figure E the
right hand side of the association (the topic) was a daughter of the node agriculture. The user can examine any cluster and see the specific associations it contains (lower
listbox, for the selected cluster caffeinedrinks).
In addition, the system tries to provide a compact generalization for all the categories on the left hand side of the associations in the cluster. In our example, the system found that all countries that are highly correlated with caffeine drinks belong either to the OAU (African Union) or the OAS (South American countries) organizations.

Figure F - Clustering associations using the category hierarchy. In the upper listbox we can see all association clusters that were formed by the system along with their sizes (in
parenthesis). In the lower listbox we see the members of the cluster that was selected in the upper listbox (caffeine drinks).


Previous                                     Next

Associations relative to a class

Consider a conditional distribution of the form P(C=c | xi), where xi is a conditioning
concept. In many cases, it is natural to expect that this distribution would be similar to
other distributions of this form, in which the conditioning event is a sibling of xi. For
example, when C denotes the node commercial-activity, and xi=Ford (the car
manufacturer), we could expect a distribution that is quite similar to such distributions
where the conditioning concept is another car manufacturer (a sibling of Ford in the
hierarchy). To capture this reasoning, we use Avg P(C=c | x), the average sibling
distribution, as a model for P(C=c | xi), where x ranges over all siblings of xi
(including xi itself). In the above example, we would measure the distance from the
distribution P(C=c | Ford) to the average distribution Avg P(C=c | x), where x ranges
over all car manufacturers and C denotes the node commercial-activity. The distance
between these two distributions would be large if the activity profile of Ford differs a
lot from the average profile of other car manufacturers. Furthermore, specific points
in the distribution (specific activities) that make a large contribution to the distance are activities which are associated with Ford much more than with other car
manufacturers.
Figure D demonstrates this type of comparison, between the topic distribution of
each G7 country and the average sibling distribution of topics for all G7 countries.
The countries are sorted in decreasing order of their distance to the average distribution,
revealing that Japan is the most “atypical” G7 country (with respect to its topic
distribution) while Germany is the most typical one. The topics that made the largest
contributions to the distance for each countries are also displayed. The user can
then click on any class member and get an expanded view of the comparison between
the topic distribution of this member and the average distribution. In Figure D we have
expanded the topic list of the UK (at the bottom-right list box), providing the
statistical detail for the strong associations between the UK and topics like bonds, sugar,
cocoa etc. In addition to their value in finding associations, comparisons of this type
provide a hierarchical browsing mechanism for keyword co-occurrence distributions. For
example, an analyst that is interested in studying the topic distribution in articles
dealing with G7 countries may first browse the average class distribution for G7, using a
presentation as in Figures 2,3. This will reveal the major topics that are generally
common for G7 countries. Then, the presentation of Figure D would reveal the
major characteristics which are specific for each country. Figure D - Comparison of the topic distribution of members of the G7 organization vs. the average topic distributions of the G7. Entries in the top listbox are sorted in decreasing order of their relative entropy distance to the average topic distribution (2nd column). The 3rd column shows the major topics that contributed to that distance. In the lower-right listbox, we can see a detailed information about these topics, for a selected country (UK). The 2nd column shows the contribution of the topic to the relative entropy distance. The 3rd and 5th columns show respectively, the percentage that the topic takes from the topic distribution of the specific country (3rd) and from the average topic distribution of the G7 countries (5th). The 4th and 6th columns show, respectively, the total number of articles in
which the topic appears with the specific country(4th), and with any G7 country(6th).
Figure E - Country-Topic associations with a high contribution to the relative entropy
distance between the topic distribution of the country and the average topic distribution
for all countries. Associations are sorted in decreasing order of the relative entropy
distance to the global average (3rd column). The 4th and 6th columns show, respectively, the percentage that the topic takes from the topic distribution of the specific country (4th) and from the average topic distribution of all countries (6th). The 5th and 7th columns show, respectively, the total number of articles in which the topic appears with the specific country(5th) and with any country(7th).


Previous                                     Next

Finding Strong Associations

A substantial part of the KDD literature deals with finding strong statistical associations (or correlations) between data elements in the collection (e.g. Toivonen et al 1995; Kloesgen 1995a; Feldman et al, 1996). Such associations were used for various applications, including:
• Supermarket shopping list: finding correlations between user purchase preferences
• Identifying telecommunications alarm rules, as associations between system attributes and faults.
In the KDT context, we are interested in finding statistical associations between
various keywords. For example, we may identify the economical topics which are
highly associated with a certain country. The comparative approach of the previous section
enables us to focus on associations that are likely to be interesting, i.e. those associations that deviate from a baseline model. For example, we will give a higher rank to an association between a country and a topic only if this association is not typical for other countries as well.


Previous                                     Next

Distribution Comparison

So far we have seen that the ability to specify keyword co-occurrence distributions provides the user with a useful mechanism for exploring subsets of documents. Taking a
KDD perspective, we are interested not only in displaying an entire distribution to the user
but also in identifying specific points in a distribution which are likely to be “interesting”.
We suggest to quantify the degree of “interest” of some data by comparing it to a
given, or an “expected”, model. For example, we may want to compare the data regarding IBM to a model constructed by some averaging of the data regarding other
computer manufacturers. Alternatively, we may want to compare the data regarding IBM
in the last year to a model constructed from the data regarding IBM in previous years.
In our case, we use keyword distributions to describe the data. We therefore need a
measure for comparing the distribution defined by the data to a model distribution.
We chose to use the relative entropy measure (or Kullback-Leibler (KL) distance), defined in information theory, though we plan to investigate other measures as well. The
KL-distance seems to be an appropriate measure for our purpose since it
measures the amount of information that we lose if we model a given distribution p by
another distribution q. Denoting the distribution of the data by p and the model
distribution by q, the distance from p(x) to q(x) measures the amount of “surprise”
in seeing p while expecting q. Formally, the relative entropy between two probability
distributions p(x) and q(x) is defined as:

The relative entropy is always non-negative and is 0 if and only if p=q.
According to this view, interesting distributions will be those with a large
distance to the model distribution. Interesting data points will be those that make a big
contribution to the distance between the given distribution and the model (i.e., x’s
whose contribution to the sum is large). The following sections show how various
interesting patterns can be identified by measuring the relative entropy distance
between a distribution and different baselinemodels.


Previous                                     Next

Keyword Co-Occurrence Distributions

All KDD operations supported by the KDT system are based on an analysis of the
keywords that annotate the articles in the collection. More specifically, KDT
computes the distribution of daughter terms relative to their siblings for all keywords in
the hierarchy. For example, the annotations of documents with daughters of the
keyword node computers may be distributed as follows: mainframes: 0.1;
work-stations: 0.4; PCs: 0.5. In formal terms, we set a node C in the hierarchy to
specify a discrete random variable whose values are denoted by its daughters, where
each occurrence of a daughter provides a data point. We denote the distribution of the
random variable by P(C=c), where c ranges over all daughters of C. The event C=c
corresponds to the annotation of a document with the daughter category c.
P(C=ci) is the proportion of annotations of documents with ci among all annotations of
documents with any daughter of C. In the example above we would say that
P(C=mainframes)=0.1, where C denotes the random variable which corresponds to
the node computers. In KDT we are most interested in conditional keyword distributions of the form P(C=c|x), where x is a conditioning event which denotes some other category
keyword. Such distributions describe the co-occurrence of the category x with all
daughters of C. Figure B shows an example for such a distribution, where C stands for the node topics and x stands for Argentina. In other words, the figure presents the distribution of topic keywords (i.e., keywords that are daughters of the topics node) in articles that are annotated also with the keyword Argentina. In Figure B the distribution is presented as a pie-chart, along with the absolute frequency of each slice in the pie:
12 articles among all articles of Argentina are annotated with sorghum, 20 with corn,
32 with grain, etc. The KDT system presents distributions in several forms,
graphical (e.g. bar-chart) or alphanumeric (see Figure C), listing absolute frequencies
or probabilities (percentage). More generally, a keyword cooccurrence distribution may be conditioned by the joint occurrence of several category keywords, and not just one. For example, Figure C displays the distribution P(C=c|x,y), where C stands for topics, x
for UK, and y for USA. In other words, this is the distribution of topics in articles that
deal with both UK and USA. The distribution is presented in the lower right
window of the screen. By letting the user specify and display
conditional keyword co-occurrence distributions, as in Figure 2 and Figure C,
the KDT system provides a powerful browsing mechanism for large subsets of
documents. A traditional document retrieval system enables the user to ask for all
documents containing the keywords UK and USA, but then presents the entire set of
matching documents without describing its internal structure. Typically, the documents
will be sorted by either relevance score, which would be determined in this case by
the frequency and position of the given keywords in the document, or by
chronological order. The KDT system, on the other hand, enables the user to
investigate the contents of this document set by sorting it according to the daughter
distribution of any node in the hierarchy, such as topics, countries, companies etc.
Once the documents are sorted, and the distribution is displayed, the user can
access the specific documents of each subgroup. In Figure C, for example, the
user chose to click on the 24 documents annotated with trade, which led to the
display of all titles of these documents (those annotated by UK, USA, and trade) in
the upper window of the screen.

Previous                                     Next

The Text Collection

As mentioned above, the KDT system expects as input documents which are
annotated with category labels, where annotation might be achieved either
manually or automatically. In the experiments described here we used the
Reuters-22173 text categorization test collection, containing about 22,000 articles,
totaling 25 megabytes. The documents in this collection appeared on the Reuters
news wire in the late 1980’s, and were assembled and indexed with categories by
personnel from Reuters Ltd. and Carnegie Group, Inc. Further formatting and data file
production was done in 1991 and 1992 by David D. Lewis and Peter Shoemaker.
The categories in this collection are classified only to five types of tags:
countries, topics, people, organizations and stock exchanges. These five types provided
us the skeleton of the keyword hierarchy, where each of the 5 types serves as an intermediate node in a two level hierarchy. We then enriched the hierarchy with some
additional sub-types of categories, such as agriculture and metals as daughters of the
topics node, and various international organizations (taken from the CIA Factbook on the Internet) as daughters of the countries node.


Previous                                     Next

Keyword Tagging and the Keyword Hierarchy

Applying KDD operations to texts requires that documents will be represented in some
structured way. We chose to base the current version of the system on the very simple
representation scheme of annotating (or tagging) each document with a set of
category label keywords. Category labels are commonly used in commercial and scientific
text collections and information feeds, and provide a high level summary for the content
of the document. For example, articles in hitech domains may be annotated with sets of
keywords such as {IBM, product announcement, Power PC} and {Motorola,
patent, cellular phone}. The annotation of documents with category labels may be either
manual or automatic. Automatic text categorization has recently been the focus of
substantial research in the IR and text processing communities (e.g. Apte et al
1994; Finch 1994; Iwayama and Tokunaga 1994). Altogether, we assume that having the
documents of the collection annotated with category labels is a reasonable pre-requisite
for the KDT system, which would hold for many text collections in the market. KDT also requires that the category keywords would be organized in a hierarchical structure. This keyword hierarchy is a directed acyclic graph (DAG) of terms, where each of the terms is identified by a unique name. Figure 1 shows a portion of an example keyword hierarchy, the one used in our work with the Reuters data (see below), which will serve as a running example throughout this paper. In such a hierarchy an arc from A to B denotes that A is a more general term than B (i.e., countries → G7 → Japan). We use a general DAG rather then a tree structure so that a keyword may belong to several parent nodes (e.g. Germany is both a European-Community and a G7 country).
It should be emphasized that the sole purpose of the keyword hierarchy is to enable generalizations and partitioning of KDD findings over sibling nodes. The structure of the hierarchy is typically simple, and reflects the basic generalizations common for the domain of interest. Such keyword hierarchies are commonly used by information providers
(e.g. the Dialog service of Knight Ridder Information Inc. or the First service of
Individual Inc.), and resemble in their form to a “subject index” in a yellow pages book.
Rich hierarchies have been developed for several professional domains, such as the
Medical Subject Heading (MeSH) hierarchy, and have been used to assist and
augment free-text searching. The task of constructing, obtaining and modifying such
hierarchies is thus relatively easy, and should not be confused with the task of
constructing a semantically rich structure, such as a semantic network or a taxonomy
in the “knowledge representation” sense. The KDT system provides a simple GUI for
constructing and editing the hierarchy, supporting additions, deletions and modifications of nodes and links (Figure A is a screen dump of the hierarchy maintenance editor).

Previous                                     Next

Introduction

Our general KDT framework was initially presented in (Feldman and Dagan 1995). The current paper both extends the KDT framework and its set of operations, and presents our interactive prototype system, which was implemented in Visual Prolog under Microsoft MS-Windows (all subsequent figures are screen dumps of this system). The system’s mode of operation involves three major steps:
1. Load input documents, annotated with keywords selected from a pre-existing
hierarchy of meaningful category labels.
2. Compute the various co-occurrence frequencies of these keywords within the documents in the collection (typically performed as a pre-processing step).
3. Provide interactive tools that allow access to documents, discover patterns across documents, and perform other similar KDD operations, based on the co-occurrence frequencies computed in the previous step.

Previous                                     Next

Introduction

2. Comparing Document Distributions: KDT can compare the distributions of keywords in two collections of documents containing similar keywords and display the results using tables and graphs.
3. Trend Analysis: KDT can compare the distributions of keywords in documents from different points in time and display the results using tables and graphs.
4. Association Discovery: KDT can search for several types of associations (e.g. Toivonen et al, 1995) between classes of documents.
5. Further, KDT includes a browsing facility in which the user can click on any discovered pattern and get the list of documents that contributed to the pattern.

These operations can assist users that have to analyze and assimilate information
spanning over a large number of documents, such as in business intelligence and economical analysis. For example, using the system an analyst can find out quickly the
most active economical areas for certain countries, or major products of companies.
Furthermore, the analyst can compare such a company “profile” to profiles of other
companies in the same business area, and discover distinguishing aspects in the activity
of the company. In business intelligence applications, the user may be interested in
comparing profiles of different companies to identify relatively weak and strong areas in
their activity, while in marketing applications an analyst may want to compare country
profiles when looking for appropriate international markets for a product. Other
types of KDT queries can answer questions like “find economical areas which are
dominant in the economies of some (unspecified) countries”, or “find economical
areas in which activity has increased or decreased in a specified period of time”.
Investigation of issues such as mentioned above is not supported directly in
conventional information retrieval systems, and typically requires a lot of manual effort
in retrieving and analyzing a large number of documents. It should be kept in mind that the answers to all KDT queries rely on document frequencies in some information source (such as a newswire or a professional magazine), which may introduce quantitative biases with respect to the real situation described in the texts. For example, an interesting story, from the media’s point of view, may be covered in a large number of articles, inflating the statistics of some items. To support verification of KDT’s finding, and for gaining further insights into them, the system provides a direct link from the results of each query to the documents which support that result.

 Previous                                     Next

Introduction

Traditional database query tools allow a user to retrieve records based on the content of each record in isolation. In a hospital database, for example, a user might request all records for hospital stays that are less than one day with a cost greater than $10,000. Each retrieved record is selected because the information in that record, independent of any other record, satisfies the user's query. In contrast, KDD work provides tools for accessing information based onpatterns appearing across records. For example, KDD tools might provide a user the ability to ask for records of patients whose medical care for some illness is much higher than typical (where "typical" is implicitly defined by the values of other records in the database), or to investigate if there exist some statistical patterns relating the length of patients’ hospital stay and their family circumstances (whether the patient is married, how many children the patient has,
etc.). Although the goal of KDD work is to provide access to patterns and information in online information collections, most efforts have focused on knowledge discovery in
structured databases. However, a tremendous amount of online information
appears only in collections of unstructured text. Most research in Information Retrieval
(IR) has developed methods for providing access to documents based on the information
contained in a document in isolation (analogous to what traditional database query tools provide for databases). In this case, it is assumed that the user knows in advance the
topic of documents of interest. Clustering methods were used to impose structure over a
collection of documents, enabling the user to browse through the collection and select clusters of documents of interest (e.g. Salton 1989; Cutting et al, 1993). Visualization
methods were also used for presenting some additional structures hidden in a document or a set of documents (Williamson and Shneiderman, 1992; Hearst 1995). However,
there has been little work on providing KDDstyle tools for browsing and analyzing text
collections based on information appearing across documents. Applying such tools to
texts means that the system would take an active role in suggesting topics of interest to
the user, as well as supply new browsing methods that rely on inter-document
information. A KDD framework for texts may thus be viewed as an intermediate point
between user-specified retrieval queries and unsupervised document clustering: the user
typically provides some guidance to the system about the type of patterns of interest,
but then the system makes unsupervised decisions in finding specific statistically
motivated patterns. This paper describes the Knowledge Discovery in Texts (KDT) system, which applies a novel knowledge discovery framework to textual databases. Our goal is to provide similar types of KDD operations previously provided for structured databases. To do so, we rely on a text-categorization paradigm where each document is labeled with sets of keywords, where each keyword comes from a hierarchy of terms. Unlike in traditional IR work, where keywords (category labels) are used in specification of retrieval (or routing) queries, KDT allows a user to access documents and recognize
patterns across them based on the observed co-occurrence distributions of keywords in
documents of the collection. A key insight in this work is that keyword co-occurence
frequencies (or distributions) can provide the foundation for a wide range of KDD
operations on collections of textual documents, including:

1. Summarization and Browsing: KDT allows the user to view the frequency of occurrence of keywords from some category in a collection of documents that contain particular keywords from some other category, and to browse the collection of documents based on these frequencies.


Previous                                     Next

Introduction

Traditional databases store large collections of information in the form of structured records, and provide methods for querying the database to obtain all records whose
content satisfies the user's query. More recently, however, researchers in Knowledge Discovery in Databases (KDD) have provided a new family of tools for accessing information in databases (e.g. Brachman etal, 1993; Frawley et al, 1991; Kloesgen,
1992; Kloesgen, 1995b; Ezawa and Norton, 1995). The goal of KDD has been defined as "the nontrivial extraction of implicit, previously unknown, and potentially useful
information from given data" (Piatetsky- Shapiro and Frawley 1991). Work in this area includes applying machine-learning and statistical-analysis techniques towards the automatic discovery of patterns in databases, as well as providing user-guided environments for exploration of data.


 Next

Keyword Analysis of Large

Abstract
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Texts. It is built on top of a text-categorization paradigm where text articles are annotated with keywords organized in a hierarchical structure. Knowledge discovery is performed by analyzing the co-occurrence frequencies of keywords from this hierarchy in the various documents. We show how this termfrequency approach supports a range of KDD operations, providing a general framework for knowledge discovery and exploration in collections of unstructured text.

December 16, 2013

50 Government Site

http://www.dhs.gov/journal/leadership/

http://www.ncptt.nps.gov/

http://forum.webcontent.gov/news/

http://ahc.ehawaii.gov/openedit/blog/

http://blog.gallegly.house.gov/

http://www.volunteerlouisiana.gov/blog/

http://www.nysenate.gov/markup/

http://www.rib.uscourts.gov/externalkb20/

http://www.ncbi.nlm.nih.gov/pubmed

http://www.tsa.gov/blog/

http://teens.drugabuse.gov/blog/

http://www.geostor.arkansas.gov/G6/Home.html

http://townhall.virginia.gov/L/Forums.cfm

http://blog.pittsburghsummit.gov/index.php/site/index/

http://www.majorityleader.gov/blog/blog.cfm

http://www.connectcontest.state.gov/contests/change-your-climate-change-our-world

http://cce.nasa.gov/

http://blog.library.ca.gov/

http://your.kingcounty.gov/votes/

http://www.burton.house.gov/posts

http://www.nlc.nebraska.gov/system/republican/

http://ideas.cio.ny.gov/

http://blogs.loc.gov/

http://tokyo.usembassy.gov/zblog/e/zblog-emain.html

http://libraries.idaho.gov/blogs/anne-abrams

Wordpress Pinging List

http://rpc.pingomatic.com/

http://api.feedster.com/ping

http://api.moreover.com/ping

http://api.my.yahoo.com/rss/ping

http://audiorpc.weblogs.com/RPC2

http://bblog.com/ping.php

http://bitacoles.net/notificacio.php

http://bitacoles.net/ping.php

http://bitacoras.net/ping

http://blo.gs/ping.php

http://blog.goo.ne.jp

http://blog.goo.ne.jp/XMLRPC

http://blogdb.jp

http://blogdb.jp/xmlrpc

http://blogdigger.com/RPC2

http://blogpeople.net/ping

edu site list

PageRank(5), http://blogs.berkeley.edu/2009/11/03/there-is-strong-research-showing-fat-people-can-be-healthy-and-fit/

PageRank(3), http://evidencebasedliving.human.cornell.edu/2011/02/another-evidence-based-diet-tip-keep-a-diary/

PageRank(1), http://rushinperson.rush.edu/2011/02/23/determining-your-target-heart-rate/

PageRank(4), http://blogs.berkeley.edu/2010/03/04/intermediaries-and-weight-loss-scams/

PageRank(4), http://blogs.bgsu.edu/amdunn/

PageRank(3), http://www.dartmouth.edu/~jacko/v2/2011/04/6-quick-tips-to-lose-weight-and-stick-it-to-that-bitch-julie-sanders/

PageRank(4), http://evidencebasedliving.human.cornell.edu/category/learningcenter/

PageRank(2), http://blogs.utk.edu/whitney/

100 High PR Press Release Sites

1 www.1888pressrelease.com

2 www.24-7pressrelease.com

3 www.addpr.com

4 www.afly.com

5 www.afreego.com

6 www.africanewswire.net

7 www.baygo.com

8 www.betanews.com

9 www.bignews.biz

10 www.billhartzer.com

11 www.bizeurope.com

12 www.business.pagerelease.com

13 www.businessopportunity.biz

14 www.businessportal24.com

15 www.businesswire.com

16 www.cdn-news.com

17 www.cgidir.com

18 www.click2newsites.com

19 www.clickpress.com

20 www.creativecommons.org

21 www.dbusiness.com

22 www.devarchive.com

23 www.devpress.com

24 www.dime-co.com

25 www.dir.free-press-release.com

Back linking Submission Site

High PR Web site list for Back linking :
http://tabulas.com
http://tblog.com
http://terapad.com
http://shoutpost.com
http://bloghi.com
http://mynewblog.com
http://netcipia.com
http://wikyblog.com
http://friendster.com
http://bigadda.com
http://blog.co.uk
http://blog.com
http://www.zimbio.com
http://gather.com
http://diaryland.com
http://edublogs.org
http://opendiary.com
http://sosblog.com
http://20six.co.uk
http://blog.ca
http://thoughts.com
http://blogskinny.com

100 Bookmarking Site list :

1     http://google.com/bookmarks     Submit     10
2     http://slashdot.org     Submit     9
3     http://digg.com     Submit     8
4     http://stumbleupon.com     Submit     8
5     http://squidoo.com     Submit     8
6     http://reddit.com     Submit     8
7     http://mixx.com     Submit     8
8     http://technorati.com     Submit     8
9     http://folkd.com     Submit     8
10     http://mister-wong.com     Submit     8
11     http://delicious.com     Submit     8
12     http://seofor24.com     Submit     7
13     http://propeller.com     Submit     7
14     http://metafilter.com     Submit     7
15     http://diigo.com     Submit     7
16     http://wikio.com     Submit     7
17     http://blogcatalog.com     Submit     7
18     http://blinklist.com     Submit     7
19     http://newsvine.com     Submit     7
20     http://connotea.org     Submit     7
21     http://sphere.com     Submit     7
22     http://citeulike.org     Submit     7
23     http://hotklix.com     Submit     6
24     http://jumptags.com     Submit     6
25     http://dzone.com     Submit     6

December 02, 2013

Search Engine Optimization (SEO) oDesk Test Answer 2013


1.   Which of the following can be termed as a good keyword selection and placement strategies?
Ans: a. targeting synonyms of the main keyword
d. Optimizing five or more keywords per page

 2.   What will happen if you type the words Certification-Networking in the Google search box?
Ans: Google will find all the web pages in which the words certification and networking  appear together

 3.   Google display up to _______ characters of a webpages Title Tag.
Ans: 66
 
 4.   Which is the function of indexer part of the Google search Engine?
Ans: It helps to retrieve web pages from the world wide wed

 5.   Which of the following statements about google techinical and quality guidlines is true?
Ans: Avoid using robot.txt to prevent crawling of search results pages or other auto- generated pages, even if they don’t add much value for users coming from the search engines.

 6.   What is the most likely time period required for getting a Google page rank?
Ans: More Than 3 months.

 7.   Which of the following is an ethical SEO Technique?
Ans: Non of the above.

 8.   What is Anchor Text?

Ans: it is the visible text that is hyper linked to another page

 9.   Google looks down upon paid links for enhancing page rank. If a website sells a links, what actions does Google recommend to avoid being penalized?
Ans: Paid links should be disclosed through index-nofollow attribute in the hyperlink

10.   what is the name of the search engine technology due to which a query for the word  ‘actor’ will also show search results for related words such as actress, acting or  act?
Ans: RSD (real time synonym detection)

11.   If you search for the term “iq test” in the word tracker keyword suggestion tool, will it return the number of independent searches for the term “iq” Yes or no?
Ans: no

12.   Which search engine technology prevents a website from appearing in numerous top positions in the search engine result pages?
Ans: clustering

13.   If a websites search engine saturation with respect to a particular search engine is 20%. What does it mean?
Ans: only 20% of the pages of the website will be indexed by the search engine

14.   Implementing a 301 permanent Redirects on the old pages to redirect to the new pages is a good idea from the SEO perspective. True or False?
Ans: True

15.   Which black hat SEO Technique is characterized by a method to deceive search engines, by detecting the search engine boot and “feeding” it with a different HTML actually served to users?
Ans: Cloaking

16.   Are Rss/Atom feeds returned in Google search results?
Ans: No

17.   All major search engines are case sensitive.
Ans: False

18.   Cloaking is a controversial SEO technique. What does it involve?
Ans: Offering a different set of web pages to the search engines.

19.   Which of the following statements is correct with regard to the description Meta tag?
Ans: Google uses the Meta description to generate the search engine results pages.

20.   Which of the following statements about the search engine optimization techniques are correct?
Ans: Websites with deep linking are looked at the favorably by search engines.

21.   Which of the following factors have an impact on the Google page rank?
Ans: the subject matter of the site providing the inbound link to a page of a web site

22.   If you enter “help site: http://www.arsbd.blogspot.com/” in the Google search box, what will Google search for?
Ans: It will find pages about help with-in http://arsbd.blogspot.com/.


23.   Which of the following website design guidelines have been recommended by Google?
Ans: Every page should be reachable from at least one static text link.

24.   Which of the following statements regarding website content are correct?
Ans: If you have two versions of a document on your website, Google recommends that you only allow the indexing of the better version.

25.   Which of the following statements about the RSS are correct?
Ans: It is a form of XML

26.   What is the main reason for the effectiveness of optimizing a WebPages towards a long tail keyword search?
Ans: there are fewer websites that have targeted such keywords.

27.   How are the site maps important for the search engine optimization process?
Ans: Site maps help the search engine spider pick up more pages from the website.

28.   Which of the following actions could get you banned by Google?
Ans: None of the above.

29.   Do search engines such as Google detect in natural linking patterns by checking the Class C ip address of the page where the link to the websites originates? Yes or no?
Ans: yes

30.   What is the term for optimization strategies that are in an unknown area of reputability/validity?
Ans: Silver hat techniques.

31.   What does the term keyword prominence refer to?
Ans: It refers to the fact that the keywords placed in important parts of a webpage are given priority by the search engines.

32.   Some words, when followed by a colon, have special meanings to yahoo. What is performed by the link: Operator?

a.    it shows all the outbound links from the url
b.   It shows how many pages of the site yahoo is pointing to
c.    It shows all the pages that point to that url (Ans)
d.   It show url’s with broken links

33.   Which blackhat Seo techniques is characterized by a method to decieve search engine, by detecting the search engine bot and “feeding” it with a different HTML code than the HTML actually served to users?

a.   Coaling
b.   Foisting
c.   Slighting
d.   Cloaking (Ans)

34.   Why is it bad idea from SEO perspective to host free articles and write ups that are very common on the internet?

a.    Because they will not lead to fresh traffic
b.   Because you could be penalize by search engine for using duplicate contents (Ans)
c.    Because you will not get the benefits of proper keyword targeting
d.   Because people could turn up claiming copyright infringement

35.   What will happen if you type the word ‘Certification-Networking’ in the Google search box?

a.    Google will find the web pages about “certification” and also containing the word “networking”
b.   Google will find ALL the web pages containing the word word “Certification” and “Networking”
c.    Google will find ALL the web pages in which the words “Certification” and “Networking” appear together. (Ans)
d.   Google will find the web pages about Certification that do not contain the word Networking

36.   Which of the following factors does Google take into account while accessing whether or not a website is an authority website?

a.    The frequency with which the contents of the website is updated
b.   the number of web pages containing relevant information on the main theme of the website
c.    The number of in-bound natural links related to the website’s theme (or keywords)(Ans)
d.   None of the above

37.   What is keyword density?

a.    The number of times the keyword is used / (divided by) the total word count on page – (minus) the total words in HTML on the page
b.   The number of times the keyword is used x (multiply by) the total word count on page.
c.    The number of times the keyword is used in the page description
d.   The number of times the keyword is used in the page title
e.    The number of times the keyword is used / (divided by) the total word count on the page. (Ans)

38.   Which of the following statement about FFA pages are true?

a.    They are greatly beneficial to SEO
b.   They are also called Link Farms (Ans)
c.    They are Paid Listings
d.   They contain numerous inbound links

39.   What is the illegal act of copying of a page by unauthorized parties in order to filter off traffic to another site called?

a.    Traffic jacking
b.   Visitors Jacking
c.    View Jacking
d.   Page Jacking (Ans)

40.   What is the most likely time period required for getting a Google page ranking?

a.    1 week
b.   3 weeks
c.    1 month
d.   More than 3 months (Ans)

41.   Which of the following can be termed as a good keyword selection and placement strategies?

a.    Targeting synonyms of the main keyword (Ans) A & D
b.   Targeting the highest searched keywords only
c.    Copying competitor keywords
d.   Optimizing 5 or more keywords per page (check) (Ans)

42. Which of the following factors contribute towards link popularity of a website?
c.     The number of pages in the website

43. Search engines do not index some common words (such as “or”, “and”, “when”, and “in”) within the webpage. What are these common words called?
d.     Stop words

44. If you search for the term “iq test” in the Word Tracker keyword suggestion tool, will it return the number of independent searches for the term “iq”?
a.     Yes

45. Which of the following search engines or directories provides the directory search results for Yahoo?
b. Yahoo Directory

46. Which of the following statements regarding website content are correct?
c. Syndicating your content could lead to Google viewing the material as duplicate

47. Which of the following statements about the Google Sitemap are correct
C.The Sitemap acceptable to Google follows an XML format

48. While grading your website, Google gives credit to outbound links pointing to authority websites?
b. False

49. Which of the following URLs can the Google search engine spider?
e. All of the above

50. What does the 302 server response code signify?
d.The page has temporarily moved


December 01, 2013

WP-Email

30. WP-Email
This WordPress plugin allows people to recommend or send your blog posts pages to friends via Email. This helps you get more visitors.

WP Flip Page

29. WP Flip Page
WP Flip Page provides a simple and effortless way to take any PDF document, ebook, or set of image files and turn them into stunning, interactive 3D flipping books with page-flipping animations that can be embedded directly into your blog! Increase the "stickiness" of your blog by creating valuable, useful, interactive content such as digital books, reports, magazines, picture portfolios and catalogues that behave like real life paper books without any advanced programming work!

Popularity Contest

28. Popularity Contest
Popularity Contest keeps a count of your post, category and archive views, comments,
trackbacks, etc. and uses them to determine which of your posts are most popular. There
are numerical values assigned to each type of view and feedback; these are used to create apopularity score' for each post.

Link Within

27. LinkWithin
LinkWithin is a blog widget that shows related stories from your blog archive under each post and links to them. This makes your older posts accessible to new or casual readers of your blog who otherwise would not encounter them. Moreover, as the widget only links to posts that are relevant to the readers of a particular post, it keeps them engaged with your blog for a longer period of time.

GD Star Rating

25. GD Star Rating
GD Star Rating plugin allows you to set up rating and review system for posts, pages and comments in your blog. You can set many options for displaying the rating stars,
and add widgets into the sidebars for displaying top ratings and other statistics
generated by the plugin. The plugin supports multiple languages such as English,
Serbian, Spanish, Hindi, German, Italian, French, Ukrainian, Belorussian, Russian,
Polish.

CommentLuv

26. CommentLuv
CommentLuv for WordPress plugin fetches the feed found at commentators’ site URL while they type their comment and extracts their last blog posttitle with link and displays it below the comment form. When they submit their comment, the last blog post link gets added at the end of their comment for all to see! This gives your web visitors more reason to leave a comment on your site.

Get Recent Comments

24. Get Recent Comments
This plugin displays excerpts of the most recent comments and/or trackbacks that have
been posted to the articles in your blog, in your sidebar. You can let the plugin order the comments by the corresponding post, or simply order them by date. The plugin can
(optionally) separate the trackbacks/pingbacks from the comments.

Subscribe to Comments

23. Subscribe to Comments
Subscribe to Comments is a robust plugin that enables commentators to sign up for e-mail notification of subsequent comments left for the post. The plugin includes a full-featured subscription manager that your commentators can use to unsubscribe to certain posts, block all notifications, or even change their notification e-mail address!

WP-Polls

22. WP-Polls
WP-Polls is a Poll plugin that is extremely customizable using its in-built templates and css styles. There are tons of options for you to choose to ensure that the polls run the way you want. It encourages interaction on your site as well as allows you to collect feedback from your Wordpress site visitors.

WordPress Auto-Translator Plugin

21. WordPress Auto-Translator Plugin
This is a Wordpress auto translation plugin that uses IP address technology. Once installed, the plugin can translate all Wordpress pages including widgets. It is a Google Translator API based and supports 50 languages which include Italian, Korean, Chinese, Portuguese, German, French, Spanish, Japanese, Hebrew, Indonesian and many more.

Pingback Optimizer

20. Pingback Optimizer
Optimize Your Pingbacks! - Automatically submit your pingback URLs to multiple RSS directories. Boost your search engine rankings and get more traffic!

WP Simplicity

19. WP Simplicity
WP Simplicity is an amazing blogging tool, which allows you to leverage your time and your income. It is developed with the most advanced features to offer time-saving solutions and transform blogging experience. It eliminates the headache of creating and managing Wordpress sites.

Subscribe 2

18. Subscribe 2
Subscribe2 provides a comprehensive subscription management and email notification
system for WordPress blogs that sends email notifications to a list of subscribers when you publish new content to your blog. Email Notifications can be sent on a per-post basis or periodically in a Digest email.

Email Users

17. Email Users
A plugin for wordpress which allows you to send an email to the registered blog users. Users can send personal emails to each other. Power users can email groups of users and even notify group of users of posts.

Easy Privacy Policy

16. Easy Privacy Policy
Google requires a Privacy Policy on any site displaying Google Adsense ads. In a lot of Affiliate Programs Terms of Service, they same thing applies. By having a Privacy Policy on your site, you take care of this requirement. There is an easy way. Just use the Easy Privacy Policy plugin.

Exclude Pages

15. Exclude Pages
This plugin comes in handy when you want to create pages that you don't want the general visitors on your site to view. This means that you can specify pages which will only be accessible through their direct URL and not visible on the main page.

Google Web Fonts for WordPress

14. Google Web Fonts for WordPress
Google Fonts Pro is a premium WordPress plugin that allows you to instantly access over
200 of Google’s Web Fonts with power and ease. You can take full control over the font and
typography used on your site in a matter of minutes!

W3 Total Cache

13. W3 Total Cache
W3 Total Cache improves the user experience of your site by improving your server performance, caching every aspect of your site, reducing the download times and providing transparent content delivery network (CDN) integration. By using this plugin you will speed up your WordPress blog significantly.

Gravity Forms

12. Gravity Forms
Gravity Forms lets you build and publish your WordPress forms in just minutes. No drudgery, just quick and easy form-building. Select your fields, configure your options and easily embed forms on your WordPress powered site using the built-in tools.

Scaleable Contact Form

11. Scaleable Contact Form
This is a contact form plugin which creates a very customizable contact form on your
Wordpress blog. It uses Captcha to prevent spam submissions and you can configure
required, non-required fields.

Fast Secure Contact Form

10. Fast Secure Contact Form
This plugin allows you to easily create and add contact forms on
your WordPress site. The contact form will let the user send emails
to a site's admin. An administration panel is present, where the
webmaster can create and preview unlimited forms.

Contact Form 7

9. Contact Form 7
Contact Form 7 can manage multiple contact forms, plus you can customize
the form and the mail contents flexibly with simple mark up. The form
supports Ajax-powered submitting, CAPTCHA, Akismet spam filtering and
so on.

Quick Shop

8. Quick Shop
Quick Shop supports any WordPress that has the sidebar widgets installed. It adds a SideBar widget that shows the user what they currently have in the cart and allows them to remove the items, not to mention a TinyMCE button to easily allow you to add products to your
posts/pages.

Wordpress Dealpon

7. Wordpress Dealpon
WP Dealpon is a completely automated, turn-key wordpress plugin that will let you run automatically controlled special offers right from your wordpress blog. You can run special offers for your products, resurrect older offers with a new, special discount and improve your promotional campaign for your products.

Cart66

6. Cart66
Cart66 allows you to sell digital and physical products, collect recurring payments for
subscriptions, memberships, and payment plans. You can even build a complete
membership site with plugin and set up coupon codes for promotions.

GetShopped

5. GetShopped
The WP e-Commerce Plugin is offered under GetShopped and is a shop owner’s dream. It
offers a fully integrated shopping cart application suitable for selling your digital and
tangible products, services, and or fees online. It can be integrated with Google Checkout,
DPS / Paystation, Paypal and offers an audio player module for selling mp3 files and digital
downloads. Its functionality can be extended by using add-on plugins.

uPricing

4. uPricing -
 Pricing Table for Wordpress uPricing is a pricing table for WordPress. It is a standalone plugin that allows you to create a pricing table to list your products in.
Perfect for comparing products across their features and price.

PayPal Payment Terminal Wordpress

3. PayPal Payment Terminal Wordpress
With its super easy installation, PayPal Payment
Terminal Wordpress allows you to start accepting paypal payments on your wordpresssite.
You can display it as a widget or on any page/post, view and sort all transactions and send
out automatic notifications to your customers and yourself.

Shopp

2. Shopp
Shopp is an e-commerce plugin that adds a feature-rich online store
to your WordPress-powered website or blog. Get your store up and
running in minutes. Then, take it to the next level with the flexibility
of Shopp’s customization tools that allow production studios and power users to tailor every
aspect of the shopping experience.

Eshop

1. Eshop
eShop is an accessible shopping cart plugin for WordPress. It is packed with various features,
including basic statistics, sales data, various payment options, various methods to list
products, basic stock control and various discount options. Most importantly, it can be
linked with multiple merchant gateways including Authorize.net, Paypal, Payson,
eProcessing Network, Webtopay, iDEAL and Cash/Cheque!

Top 118 Wordpress Plugins

Introduction
The name of this report is actually a bit of a misnomer as there are actually 118 plugins
listed. The graphics were prepared earlier, when I figured 100 would be the number to stop
at, but there are just so many fantastic plugins available for WordPress that for the sake of
completeness, I reckoned I had to mention some additional plugins.
Anyway, this is now a pretty complete list of highly recommended plugins covering various
aspects of blog building.
If you’re not familiar with the latest version of WordPress, there’s a set of 30 videos here
that will teach you everything you need to know about building blogs with WordPress
(Master Resale Rights are included):

Set up a new computer

8. Now find and double click on the pcdecrapifier program, it will take a little while to start as it is coming off a usb drive vise the hard drive, when it comes up it will list all the programs installed on the computer at this time, go through this list and check the
shareware items to uninstall them, that is if you do not want the program installed and do not plan on paying for using the program later if you don’t already have a license for it now. If you plan on using Microsoft office and you have the cd and a license for that one and it lists one here I would suggest uninstalling this version along with the registration/activation program for it as if you leave it in and plan on installing your version later you will come up with problems and conflicts when you try and install your version from the cd so it is better to uninstall it here first before you go any farther and spend three or more days later trying to figure out why you can’t install your version from the cd like you did before. After you have the programs uninstall it should ask you to reboot, reboot no matter what and when it comes back up to the desktop reboot again just to make sure that the registry gets set to one that works right just in case when we go forward something happens you can reboot and hopefully have a known good configuration to boot with.
9. Now after you have all the programs uninstalled and have rebooted at least twice after to make sure that all the items are out of the registry it is time to install the programs you are going to be using and want installed but there are a few things to get done before all that so please don’t get impatient and just install all the programs yet. The first thing we have to do is install a defrag program and a image creation program, I always use the Auslogics disk defragger as it is rated as one of the best and it is free of charge and that makes it even better. I have not had too much luck with image creation programs as when I need them to restore the system the ones I have had did not work and I ended up reinstalling from scratch all the time but some have used, like and recommend the Acronis True Image program but it cost money and if you don’t have the money to spend try either Macrium Reflect or Paragon Backup & Recovery free edition. I have these last two installed and made images using them and the system image option in the Windows Backup and Restore so I have 3 images that hopefully if I need one now I can use one of them to get system back up and running. These last three are free. When you make the image be sure and put a check mark in the verify when done so it will check to make sure the image created should be good when you might need it later on.
10. Now that we have the defrag and image creation software installed the first thing to do is to defrag the hard drive, it might take a while to do this as we have uninstalled a lot of programs and installed a couple. When it is done defragging reboot the computer.
11. Now we are going to make an image of the system before we go any farther so that we don’t have to start all over if something happens and totally messes up the computer to where you cannot get anything done right or in worst case it will not boot up properly. Unfortunately I don’t know of a way to test out the image you just created other than to use the program and restore it to the hard drive, if you want to you can, I have not done this, that is why I really don’t know if my image backups work or not, it is a good idea to do it but it does take time to do and I don’t know if you want to take the time and I cannot write the procedure for doing here with one now either. Make sure that you make the recovery disk with the program so that if you need to restore the images you
have a bootable cd to use to get the image restored, if you don’t do this you cannot use the program to restore the image if you can’t boot the computer and you will have to do a complete reinstall so you can install the program to use for the restore so save yourself some time and do this.
12. Now is the time to install the antivirus program, the antimalware program and the firewall program you are going to use, and no it is not time to hook up the cable for the internet connection yet as you will most likely get hid with malware and virus’s before you get anything else done and then you have to get rid of them so you can use your system to do what you want to do with it.
13. Now that we have the antivirus, antimalware and firewall system installed, create another disk image of the system before we hook up to the internet. Then be sure and reboot the computer.
14. Now we are going to hook up to the internet to update the computer with all the security updates and all other updates. If you are going to be using a router like I suggest you do so you have just a little more protection between you and the internet and it is also easier to do networking if you are. When you hook up the internet I suggest to use a cable from the router to the computer first if you are later going to use wireless you can but it sure makes it easier right now to use a cable so you don’t have to worry about cell phone or cordless phone traffic interfering with your updates. After you have the internet hooked up and have an ip address and can get to somewhere the first thing you will want to do is update the virus and antimalware programs so they can protect you from the new stuff that came up after the program was put out the you just installed, after they are updated you are going to have to reboot for them to work to their best and I really suggest you do it now.
15. After you have rebooted and are now back on the internet go into the control panel or the programs listing and do a complete windows update and update all security and critical updates first as they will help protect you from the rotten stuff on the internet.
16. After all the windows, virus and antimalware programs are updated it is time to start installing all the other software that you want installed and are going to use. This is going to take some time as you have to install them all one at a time, and most of them will tell you to reboot after the install, what I do is install three or four programs and then reboot so I am not rebooting every five minutes or so. Some programs might require other programs like certain .net framework version before you install them and there is no way to now until you start installing the program or know about before hand as the box states it or you have experience from installing it on other computers before.
17. When you have all the programs installed, you will most likely have to do some configuring in them to get them the way that you like, so do it now if you know you have to then do a complete defrag on the system and be sure to reboot after.
18. Now we are going to do another image backup so we hopefully have it the way that we want it to be all the time, don’t worry later you are going to come up with other programs you want installed or find out that one of the programs you installed you really don’t want after all, don’t worry as later you can do image backups also just don’t overwrite the ones you have now so at least you have a fairly good starting point to work from vice having to reinstall everything from scratch. It takes a little while longer to recover if you have to reinstall from scratch all the time but you do get good at it after a while doing it but it does get irritating doing that all time. Now you should have at least three image backups on your external hard drive, hopefully you will never need them but there is always a time you will.
Now that all that is done I sure hope you enjoy your new computer system and don’t get hit with virus’s or malware and be sure to look out for any phishing websites, at least IE8 and FireFox will try and help you out with phishing sites as they have systems set up to warn you before you even get to sites that are known to be phishing sites.

How to set up a new computer

I have had many questions and times when I had to set up a new computer that someone had just bought either from a store or a website and it arrived on their doorstep. The following is the procedure that I use to make sure that the system is ready to put online and available for the internet to start trying to be used in a botnet or hit with all the virus’s that are out there looking for new systems that someone has not set up to stop them. I am not by anyway the expert as this is just the way I do it and there are probably other ways and systems to do it also but I don’t have access to them.
The first thing that you need to do is get some programs handy and ready to be used and installed when we finally get the computer started and running, and don’t hook it up to the internet right away as there is some things that need to be done before we hook the computer up to the internet and they really really need to be done first so that hopefully you don’t get your computer hooked into a botnet system that lets someone else do whatever they want to with your system and steal all your financial info and personal info. What we are going to do is no guarantee that you will not get hooked into one of these schemes but it sure will help stop you from getting into them. The following is a list of software that you need to get available, if you have a usb device it would make things a lot easier using one of them but it is not required, a cd with the software on it will work also but it is a little slower than a usb.
You will also need or at least I suggest having an external hard drive that is at least 4 times bigger than the hard drive installed on your computer handy to make image backups on and later you can use it also to hold backups of all your personnel files and settings if you like. They are fairly cheap and handy at most electronics stores or you can find them on the internet and have one or more shipped to your house.
a. Pcdecrapifier or Revo Uninstaller, I recommend installing this and using it vice the control panel as it removes extra left over things in registry and hard drive. You can download it from this page http://allkeywordanalysis.blogspot.com/
b. Virus protection software like Avast or whatever one you want to use(it is a very good recommendation that you only install one virus program as two will get in the way of each other and could possibly lock your computer up when you are trying to get something done).
c. Antimalware program like SuperAntiSpyware, Advanced System Care, WinPatrol(which I really recommend you install, the free version is good enough for all these programs.).
d. Image creation software like Acronis True Image, Macrium Reflect, or Paragon Backup & Recovery Free(I recommend using the free version at least until you find out if you want to buy the programs, most programs that you pay for give you some time to try before you buy and I really recommend you try before buying any).
e. Software Firewall program like PC Tools, ZoneAlarm or whatever one you want to use, and I and a lot of other people recommend only installing one firewall also so you don’t end up with conflicts and lock ups due to firewalls competing with each other and stopping everything. I recommend a different one than the one that came with windows as it lets to much go through and most of the time does not check anything going out of your computer to the internet.
If you don’t know where to get this software from there are two things you can do, one is go to internet and do a search for them or second go to http://allkeywordanalysis.blogspot.com/ and get links to all the software there and then download it from the manufacture’s site so you are sure that you get the version for your computer. Be sure to look and make sure that you get the 64bit software for 64bit computers as it will not work on 32bit computers at all, check your version of windows also as some computer manufacture’s put 32bit OS software on a 64bit computer so you have to have the 32bit version to work. To use this website you do not have to register for anything at all, all the reviews and lists are there for everyone freely.
I hope you find this useful and by no means do I mean this is the only way to do it.
1. When opening your box make sure to go through the packing list and see that everything that is listed is actually there. Most of the time it is but there have been some times when it was missing something. Not putting a recovery cd/dvd in for the system is the way some manufactures do normal business.
2. Take the big picture page and hook the computer up according to it, most systems have color coded connections now so it is not as hard as it used to be to hook the system up right the first time. If you are color blind and don’t have any idea of how it is to hooked up please do yourself a real big favor and get a kid or someone that is not color blind to help, on some system if you hook it up wrong you might have the possibility of hooking power into an output jack and that could depending on what it is burn out the whole computer and make it useless, be very careful. Laptop are easier to hook up as there are a lot fewer connections to make but they also are most likely not to be color coded either.
3. When you have everything hooked up properly and set up to where it is going to go and it is comfortable to you then turn the power on(you do have it plugged into a power strip right?) by turning the power strip on first and you could see some lights flicker on the system but most of the time that is normal and nothing to worry about. When you have the power strip turned on then turn on the monitor, then the tower or main computer and watch all the lights flash and then go out, some might come back on and stay depending on your computer but there should be a power and a hard drive light that comes on and stays on all the time there is power to the system.
4. When it comes up it will start by asking some questions and having you fill out some info, try and take your time doing this and when it asks you to create your user name and password for the system be sure and pick one that you are going to be able to remember as if you forget it and never made a backup disk you will not be able to get
into the system using that account info anymore and if you encrypted anything it will be lost unless you want to pay someone some big money to try and get it back and then there is no guarantee that it will work. When creating a password try and take a phrase that you can remember easily and take parts of it to make the password so it is more complex and harder to break, do not use your address, SSN, family member names or anything that just about anybody can find about you in a real quick search or asking a neighbor.
5. The first thing to do is to make the recovery cd’s so that if something goes wrong and your system gets totally messed up you can reset it to the way the factory sent it out, make sure you have at least 3 DVD’s available as that is probably how many you will need to make the recovery system disk’s. In your manual it will tell you how many most of the time, or at least it will when you start the program and start the process. In the manual it should tell you exactly how to do this and I would do it first before you do anything else. Likely there will be an icon on the desktop to click on and start the program or you will have to go into the programs listing and find it there, listed under the computer manufactures name most likely, sometimes they might call it something else also. Also make a password reset disk so that if you do by chance forget the password you can reset it and still get back into the computer and get all the info that you have there like encrypted files.
6. After you have made the restore DVD’s make sure they work by booting up the computer from them, they will do you no good if they cannot bootup the computer. They should though, as most of the time they manufacture has made it so you can only make one set of recovery disk’s no matter what and if they don’t boot up you cannot make anymore, if they don’t work call the computer manufacture’s toll free number and talk to them now as the longer you wait the less likely they will or can help you out with it.
7. Now is the time to get the usb drive or cd that you made before with all the software on it that I had you make before you started the process and put it into the slot and wait for it to come up, it will install the driver first and then it will tell you that it is installed and a window will open (similar to the first picture below) asking you what you want to do with it now. Towards the bottom of the window there should be open folder to view files, click on this option and you should get a window like the second picture following but your screen while show what you have on the drive(the picture list what is on mine).